351
|
Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass. Nature 2021; 590:438-444. [PMID: 33505029 PMCID: PMC7886653 DOI: 10.1038/s41586-020-03127-1] [Citation(s) in RCA: 111] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Long-term climate change and periodic environmental extremes threaten food and fuel security1 and global crop productivity2-4. Although molecular and adaptive breeding strategies can buffer the effects of climatic stress and improve crop resilience5, these approaches require sufficient knowledge of the genes that underlie productivity and adaptation6-knowledge that has been limited to a small number of well-studied model systems. Here we present the assembly and annotation of the large and complex genome of the polyploid bioenergy crop switchgrass (Panicum virgatum). Analysis of biomass and survival among 732 resequenced genotypes, which were grown across 10 common gardens that span 1,800 km of latitude, jointly revealed extensive genomic evidence of climate adaptation. Climate-gene-biomass associations were abundant but varied considerably among deeply diverged gene pools. Furthermore, we found that gene flow accelerated climate adaptation during the postglacial colonization of northern habitats through introgression of alleles from a pre-adapted northern gene pool. The polyploid nature of switchgrass also enhanced adaptive potential through the fractionation of gene function, as there was an increased level of heritable genetic diversity on the nondominant subgenome. In addition to investigating patterns of climate adaptation, the genome resources and gene-trait associations developed here provide breeders with the necessary tools to increase switchgrass yield for the sustainable production of bioenergy.
Collapse
|
352
|
Zarate S, Carroll A, Mahmoud M, Krasheninina O, Jun G, Salerno WJ, Schatz MC, Boerwinkle E, Gibbs RA, Sedlazeck FJ. Parliament2: Accurate structural variant calling at scale. Gigascience 2020; 9:giaa145. [PMID: 33347570 PMCID: PMC7751401 DOI: 10.1093/gigascience/giaa145] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/17/2020] [Accepted: 11/18/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Collapse
Affiliation(s)
- Samantha Zarate
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Carroll
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olga Krasheninina
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Goo Jun
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - William J Salerno
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael C Schatz
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
353
|
Kou Y, Liao Y, Toivainen T, Lv Y, Tian X, Emerson JJ, Gaut BS, Zhou Y. Evolutionary Genomics of Structural Variation in Asian Rice (Oryza sativa) Domestication. Mol Biol Evol 2020; 37:3507-3524. [PMID: 32681796 PMCID: PMC7743901 DOI: 10.1093/molbev/msaa185] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Structural variants (SVs) are a largely unstudied feature of plant genome evolution, despite the fact that SVs contribute substantially to phenotypes. In this study, we discovered SVs across a population sample of 347 high-coverage, resequenced genomes of Asian rice (Oryza sativa) and its wild ancestor (O. rufipogon). In addition to this short-read data set, we also inferred SVs from whole-genome assemblies and long-read data. Comparisons among data sets revealed different features of genome variability. For example, genome alignment identified a large (∼4.3 Mb) inversion in indica rice varieties relative to japonica varieties, and long-read analyses suggest that ∼9% of genes from the outgroup (O. longistaminata) are hemizygous. We focused, however, on the resequencing sample to investigate the population genomics of SVs. Clustering analyses with SVs recapitulated the rice cultivar groups that were also inferred from SNPs. However, the site-frequency spectrum of each SV type-which included inversions, duplications, deletions, translocations, and mobile element insertions-was skewed toward lower frequency variants than synonymous SNPs, suggesting that SVs may be predominantly deleterious. Among transposable elements, SINE and mariner insertions were found at especially low frequency. We also used SVs to study domestication by contrasting between rice and O. rufipogon. Cultivated genomes contained ∼25% more derived SVs and mobile element insertions than O. rufipogon, indicating that SVs contribute to the cost of domestication in rice. Peaks of SV divergence were enriched for known domestication genes, but we also detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest.
Collapse
Affiliation(s)
- Yixuan Kou
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
- Laboratory of Subtropical Biodiversity, Jiangxi Agricultural University, Nanchang, China
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
| | - Tuomas Toivainen
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
- Department of Agricultural Sciences, University of Helsinki, Helsinki, Finland
| | - Yuanda Lv
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
| | - Xinmin Tian
- Department of Biological Sciences, College of Life Science and Technology, Xinjiang University, Urumqi, China
| | - J J Emerson
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
| | - Brandon S Gaut
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
| | - Yongfeng Zhou
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA
- Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| |
Collapse
|
354
|
Genome-Wide Analysis of Off-Target CRISPR/Cas9 Activity in Single-Cell-Derived Human Hematopoietic Stem and Progenitor Cell Clones. Genes (Basel) 2020; 11:genes11121501. [PMID: 33322084 PMCID: PMC7762975 DOI: 10.3390/genes11121501] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 11/28/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022] Open
Abstract
CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9)-mediated genome editing holds remarkable promise for the treatment of human genetic diseases. However, the possibility of off-target Cas9 activity remains a concern. To address this issue using clinically relevant target cells, we electroporated Cas9 ribonucleoprotein (RNP) complexes (independently targeted to two different genomic loci, the CXCR4 locus on chromosome 2 and the AAVS1 locus on chromosome 19) into human mobilized peripheral blood-derived hematopoietic stem and progenitor cells (HSPCs) and assessed the acquisition of somatic mutations in an unbiased, genome-wide manner via whole genome sequencing (WGS) of single-cell-derived HSPC clones. Bioinformatic analysis identified >20,000 total somatic variants (indels, single nucleotide variants, and structural variants) distributed among Cas9-treated and non-Cas9-treated control HSPC clones. Statistical analysis revealed no significant difference in the number of novel non-targeted indels among the samples. Moreover, data analysis showed no evidence of Cas9-mediated indel formation at 623 predicted off-target sites. The median number of novel single nucleotide variants was slightly elevated in Cas9 RNP-recipient sample groups compared to baseline, but did not reach statistical significance. Structural variants were rare and demonstrated no clear causal connection to Cas9-mediated gene editing procedures. We find that the collective somatic mutational burden observed within Cas9 RNP-edited human HSPC clones is indistinguishable from naturally occurring levels of background genetic heterogeneity.
Collapse
|
355
|
Lee I, Razaghi R, Gilpatrick T, Molnar M, Gershman A, Sadowski N, Sedlazeck FJ, Hansen KD, Simpson JT, Timp W. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat Methods 2020; 17:1191-1199. [PMID: 33230324 PMCID: PMC7704922 DOI: 10.1038/s41592-020-01000-7] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 10/17/2020] [Indexed: 12/30/2022]
Abstract
Probing epigenetic features on DNA has tremendous potential to advance our understanding of the phased epigenome. In this study, we use nanopore sequencing to evaluate CpG methylation and chromatin accessibility simultaneously on long strands of DNA by applying GpC methyltransferase to exogenously label open chromatin. We performed nanopore sequencing of nucleosome occupancy and methylome (nanoNOMe) on four human cell lines (GM12878, MCF-10A, MCF-7 and MDA-MB-231). The single-molecule resolution allows footprinting of protein and nucleosome binding, and determination of the combinatorial promoter epigenetic signature on individual molecules. Long-read sequencing makes it possible to robustly assign reads to haplotypes, allowing us to generate a fully phased human epigenome, consisting of chromosome-level allele-specific profiles of CpG methylation and chromatin accessibility. We further apply this to a breast cancer model to evaluate differential methylation and accessibility between cancerous and noncancerous cells.
Collapse
Affiliation(s)
- Isac Lee
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Roham Razaghi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Timothy Gilpatrick
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Michael Molnar
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Norah Sadowski
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD, USA
| | - Jared T Simpson
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
356
|
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes (Basel) 2020; 11:E1444. [PMID: 33266238 PMCID: PMC7760597 DOI: 10.3390/genes11121444] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 11/24/2020] [Accepted: 11/26/2020] [Indexed: 01/23/2023] Open
Abstract
Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.
Collapse
Affiliation(s)
- Nazeefa Fatima
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Anna Petri
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Ulf Gyllensten
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Lars Feuk
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 752 36 Uppsala, Sweden; (N.F.); (A.P.); (U.G.); (L.F.)
- Department of Epidemiology and Preventive Medicine, Monash University, Melbourne, Clayton, VIC 3800, Australia
| |
Collapse
|
357
|
Lin Y, Luo Y, Sun Y, Guo W, Zhao X, Xi Y, Ma Y, Shao M, Tan W, Gao G, Wu C, Lin D. Genomic and transcriptomic alterations associated with drug vulnerabilities and prognosis in adenocarcinoma at the gastroesophageal junction. Nat Commun 2020; 11:6091. [PMID: 33257699 PMCID: PMC7705019 DOI: 10.1038/s41467-020-19949-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 11/08/2020] [Indexed: 02/08/2023] Open
Abstract
Adenocarcinoma at the gastroesophageal junction (ACGEJ) has dismal clinical outcomes, and there are currently few specific effective therapies because of limited knowledge on its genomic and transcriptomic alterations. The present study investigates genomic and transcriptomic changes in ACGEJ from Chinese patients and analyzes their drug vulnerabilities and associations with the survival time. Here we show that the major genomic changes of Chinese ACGEJ patients are chromosome instability promoted tumorigenic focal copy-number variations and COSMIC Signature 17-featured single nucleotide variations. We provide a comprehensive profile of genetic changes that are potentially vulnerable to existing therapeutic agents and identify Signature 17-correlated IFN-α response pathway as a prognostic marker that might have practical value for clinical prognosis of ACGEJ. These findings further our understanding on the molecular biology of ACGEJ and may help develop more effective therapeutic strategies. Adenocarcinoma at the gastroesophageal junction has a dismal prognosis and few drug options. Here, the authors present genomic and transcriptomic features and potential therapeutic targets and prognostic biomarkers of Chinese and Caucasian tumours, and reveal the molecular similarities.
Collapse
Affiliation(s)
- Yuan Lin
- Beijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China
| | - Yingying Luo
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yanxia Sun
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wenjia Guo
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.,Cancer Institute, Affiliated Cancer Hospital of Xinjiang Medical University, Urumqi, China
| | - Xuan Zhao
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yiyi Xi
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yuling Ma
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mingming Shao
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wen Tan
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ge Gao
- Beijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing, China. .,State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing, China.
| | - Chen Wu
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. .,Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China. .,CAMS Key Laboratory of Genetics and Genomic Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Dongxin Lin
- Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.,Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China.,Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Guangzhou, China
| |
Collapse
|
358
|
Guo J, Cao K, Deng C, Li Y, Zhu G, Fang W, Chen C, Wang X, Wu J, Guan L, Wu S, Guo W, Yao JL, Fei Z, Wang L. An integrated peach genome structural variation map uncovers genes associated with fruit traits. Genome Biol 2020; 21:258. [PMID: 33023652 PMCID: PMC7539501 DOI: 10.1186/s13059-020-02169-y] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 09/23/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Genome structural variations (SVs) have been associated with key traits in a wide range of agronomically important species; however, SV profiles of peach and their functional impacts remain largely unexplored. RESULTS Here, we present an integrated map of 202,273 SVs from 336 peach genomes. A substantial number of SVs have been selected during peach domestication and improvement, which together affect 2268 genes. Genome-wide association studies of 26 agronomic traits using these SVs identify a number of candidate causal variants. A 9-bp insertion in Prupe.4G186800, which encodes a NAC transcription factor, is shown to be associated with early fruit maturity, and a 487-bp deletion in the promoter of PpMYB10.1 is associated with flesh color around the stone. In addition, a 1.67 Mb inversion is highly associated with fruit shape, and a gene adjacent to the inversion breakpoint, PpOFP1, regulates flat shape formation. CONCLUSIONS The integrated peach SV map and the identified candidate genes and variants represent valuable resources for future genomic research and breeding in peach.
Collapse
Affiliation(s)
- Jian Guo
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
- College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan, China
| | - Ke Cao
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Cecilia Deng
- The New Zealand Institute for Plant & Food Research Limited, Private Bag 92169, Auckland, 1142, New Zealand
| | - Yong Li
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Gengrui Zhu
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Weichao Fang
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Changwen Chen
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Xinwei Wang
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Jinlong Wu
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Liping Guan
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China
| | - Shan Wu
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY, USA
| | - Wenwu Guo
- College of Horticulture & Forestry Sciences, Huazhong Agricultural University, Wuhan, China
| | - Jia-Long Yao
- The New Zealand Institute for Plant & Food Research Limited, Private Bag 92169, Auckland, 1142, New Zealand.
| | - Zhangjun Fei
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY, USA.
- US Department of Agriculture-Agricultural Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, NY, USA.
| | - Lirong Wang
- Zhengzhou Fruit Research Institute, Chinese Academy of Agricultural Sciences, Zhengzhou, China.
| |
Collapse
|
359
|
Hall A, Bandres-Ciga S, Diez-Fairen M, Quinn JP, Billingsley KJ. Genetic Risk Profiling in Parkinson's Disease and Utilizing Genetics to Gain Insight into Disease-Related Biological Pathways. Int J Mol Sci 2020; 21:E7332. [PMID: 33020390 PMCID: PMC7584037 DOI: 10.3390/ijms21197332] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 12/18/2022] Open
Abstract
Parkinson's disease (PD) is a complex disorder underpinned by both environmental and genetic factors. The latter only began to be understood around two decades ago, but since then great inroads have rapidly been made into deconvoluting the genetic component of PD. In particular, recent large-scale projects such as genome-wide association (GWA) studies have provided insight into the genetic risk factors associated with genetically ''complex'' PD (PD that cannot readily be attributed to single deleterious mutations). Here, we discuss the plethora of genetic information provided by PD GWA studies and how this may be utilized to generate polygenic risk scores (PRS), which may be used in the prediction of risk and trajectory of PD. We also comment on how pathway-specific genetic profiling can be used to gain insight into PD-related biological pathways, and how this may be further utilized to nominate causal PD genes and potentially druggable therapeutic targets. Finally, we outline the current limits of our understanding of PD genetics and the potential contribution of variation currently uncaptured in genetic studies, focusing here on uncatalogued structural variants.
Collapse
Affiliation(s)
- Ashley Hall
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular & Integrative Biology, University of Liverpool, L69 7BE, UK; (A.H.); (J.P.Q.)
| | - Sara Bandres-Ciga
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA;
| | - Monica Diez-Fairen
- Neurogenetics Group, University Hospital MutuaTerrassa, Sant Antoni 19, 08221 Terrassa, Barcelona, Spain;
| | - John P. Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular & Integrative Biology, University of Liverpool, L69 7BE, UK; (A.H.); (J.P.Q.)
| | - Kimberley J. Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA;
| |
Collapse
|
360
|
Aganezov S, Goodwin S, Sherman RM, Sedlazeck FJ, Arun G, Bhatia S, Lee I, Kirsche M, Wappel R, Kramer M, Kostroff K, Spector DL, Timp W, McCombie WR, Schatz MC. Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing. Genome Res 2020; 30:1258-1273. [PMID: 32887686 PMCID: PMC7545150 DOI: 10.1101/gr.260497.119] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 08/07/2020] [Indexed: 12/14/2022]
Abstract
Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×–30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Rachel M Sherman
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Gayatri Arun
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sonam Bhatia
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Isac Lee
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - David L Spector
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Winston Timp
- Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21211, USA
| | | | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21211, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.,Department of Biology, Johns Hopkins University, Baltimore, Maryland 21211, USA
| |
Collapse
|
361
|
López-Girona E, Davy MW, Albert NW, Hilario E, Smart MEM, Kirk C, Thomson SJ, Chagné D. CRISPR-Cas9 enrichment and long read sequencing for fine mapping in plants. PLANT METHODS 2020; 16:121. [PMID: 32884578 PMCID: PMC7465313 DOI: 10.1186/s13007-020-00661-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 08/18/2020] [Indexed: 05/03/2023]
Abstract
BACKGROUND Genomic methods for identifying causative variants for trait loci applicable to a wide range of germplasm are required for plant biologists and breeders to understand the genetic control of trait variation. RESULTS We implemented Cas9-targeted sequencing for fine-mapping in apple, a method combining CRISPR-Cas9 targeted cleavage of a region of interest, followed by enrichment and long-read sequencing using the Oxford Nanopore Technology (ONT). We demonstrated the capability of this methodology to specifically cleave and enrich a plant genomic locus spanning 8 kb. The repeated mini-satellite motif located upstream of the Malus × domestica (apple) MYB10 transcription factor gene, causing red fruit colouration when present in a heterozygous state, was our exemplar to demonstrate the efficiency of this method: it contains a genomic region with a long structural variant normally ignored by short-read sequencing technologiesCleavage specificity of the guide RNAs was demonstrated using polymerase chain reaction products, before using them to specify cleavage of high molecular weight apple DNA. An enriched library was subsequently prepared and sequenced using an ONT MinION flow cell (R.9.4.1). Of the 7,056 ONT reads base-called using both Albacore2 (v2.3.4) and Guppy (v3.2.4), with a median length of 9.78 and 9.89 kb, respectively, 85.35 and 91.38%, aligned to the reference apple genome. Of the aligned reads, 2.98 and 3.04% were on-target with read depths of 180 × and 196 × for Albacore2 and Guppy, respectively, and only five genomic loci were off-target with read depth greater than 25 × , which demonstrated the efficiency of the enrichment method and specificity of the CRISPR-Cas9 cleavage. CONCLUSIONS We demonstrated that this method can isolate and resolve single-nucleotide and structural variants at the haplotype level in plant genomic regions. The combination of CRISPR-Cas9 target enrichment and ONT sequencing provides a more efficient technology for fine-mapping loci than genome-walking approaches.
Collapse
Affiliation(s)
- Elena López-Girona
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | | | - Nick W. Albert
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | | | - Maia E. M. Smart
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | - Chris Kirk
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| | | | - David Chagné
- The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research), Private Bag 11600, Palmerston North, 4442 New Zealand
| |
Collapse
|
362
|
North HL, Caminade P, Severac D, Belkhir K, Smadja CM. The role of copy-number variation in the reinforcement of sexual isolation between the two European subspecies of the house mouse. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190540. [PMID: 32654648 PMCID: PMC7423270 DOI: 10.1098/rstb.2019.0540] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/04/2020] [Indexed: 12/24/2022] Open
Abstract
Reinforcement has the potential to generate strong reproductive isolation through the evolution of barrier traits as a response to selection against maladaptive hybridization, but the genetic changes associated with this process remain largely unexplored. Building upon the increasing evidence for a role of structural variants in adaptation and speciation, we addressed the role of copy-number variation in the reinforcement of sexual isolation evidenced between the two European subspecies of the house mouse. We characterized copy-number divergence between populations of Mus musculus musculus that display assortative mate choice, and those that do not, using whole-genome resequencing data. Updating methods to detect deletions and tandem duplications (collectively: copy-number variants, CNVs) in Pool-Seq data, we developed an analytical pipeline dedicated to identifying genomic regions showing the expected pattern of copy-number displacement under a reinforcement scenario. This strategy allowed us to detect 1824 deletions and seven tandem duplications that showed extreme differences in frequency between behavioural classes across replicate comparisons. A subset of 480 deletions and four tandem duplications were specifically associated with the derived trait of assortative mate choice. These 'Choosiness-associated' CNVs occur in hundreds of genes. Consistent with our hypothesis, such genes included olfactory receptors potentially involved in the olfactory-based assortative mate choice in this system as well as one gene, Sp110, that is known to show patterns of differential expression between behavioural classes in an organ used in mate choice-the vomeronasal organ. These results demonstrate that fine-scale structural changes are common and highly variable within species, despite being under-studied, and may be important targets of reinforcing selection in this system and others. This article is part of the theme issue 'Towards the completion of speciation: the evolution of reproductive isolation beyond the first barriers'.
Collapse
Affiliation(s)
- Henry L. North
- Institut des Sciences de l'Evolution (UMR 5554 CNRS, IRD, EPHE, Université de Montpellier), Université de Montpellier, Campus Triolet, Place Eugène Bataillon, 34095 Montpellier, France
| | - Pierre Caminade
- Institut des Sciences de l'Evolution (UMR 5554 CNRS, IRD, EPHE, Université de Montpellier), Université de Montpellier, Campus Triolet, Place Eugène Bataillon, 34095 Montpellier, France
| | - Dany Severac
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, 141 rue de la cardonille, 34094 Montpellier Cedex 5, France
| | - Khalid Belkhir
- Institut des Sciences de l'Evolution (UMR 5554 CNRS, IRD, EPHE, Université de Montpellier), Université de Montpellier, Campus Triolet, Place Eugène Bataillon, 34095 Montpellier, France
| | - Carole M. Smadja
- Institut des Sciences de l'Evolution (UMR 5554 CNRS, IRD, EPHE, Université de Montpellier), Université de Montpellier, Campus Triolet, Place Eugène Bataillon, 34095 Montpellier, France
| |
Collapse
|
363
|
Jiang T, Liu Y, Jiang Y, Li J, Gao Y, Cui Z, Liu Y, Liu B, Wang Y. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol 2020; 21:189. [PMID: 32746918 PMCID: PMC7477834 DOI: 10.1186/s13059-020-02107-y] [Citation(s) in RCA: 208] [Impact Index Per Article: 41.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 07/14/2020] [Indexed: 01/01/2023] Open
Abstract
Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV.
Collapse
Affiliation(s)
- Tao Jiang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yongzhuang Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yue Jiang
- Nebula Genomics, Harbin, 150030, Heilongjiang, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, Guangdong, China
| | - Yan Gao
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Zhe Cui
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Yadong Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| | - Yadong Wang
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
364
|
Wang TY, Yang R. ScanITD: Detecting internal tandem duplication with robust variant allele frequency estimation. Gigascience 2020; 9:giaa089. [PMID: 32852038 PMCID: PMC7450668 DOI: 10.1093/gigascience/giaa089] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Internal tandem duplications (ITDs) are tandem duplications within coding exons and are important prognostic markers and drug targets for acute myeloid leukemia (AML). Next-generation sequencing has enabled the discovery of ITD at single-nucleotide resolution. ITD allele frequency is used in the risk stratification of patients with AML; higher ITD allele frequency is associated with poorer clinical outcomes. However, the ITD allele frequency data are often unavailable to treating physicians and the detection of ITDs with accurate variant allele frequency (VAF) estimation remains challenging for short-read sequencing. RESULTS Here we present the ScanITD approach, which performs a stepwise seed-and-realignment procedure for ITD detection with accurate VAF prediction. The evaluations on simulated and real data demonstrate that ScanITD outperforms 3 state-of-the-art ITD detectors, especially for VAF estimation. Importantly, ScanITD yields better accuracy than general-purpose structural variation callers for predicting ITD size range duplications. CONCLUSIONS ScanITD enables the accurate identification of ITDs with robust VAF estimation. ScanITD is written in Python and is open-source software that is freely accessible at https://github.com/ylab-hi/ScanITD.
Collapse
Affiliation(s)
- Ting-You Wang
- The Hormel Institute, University of Minnesota, 801 16th Ave NE, Austin, MN 55912, USA
| | - Rendong Yang
- The Hormel Institute, University of Minnesota, 801 16th Ave NE, Austin, MN 55912, USA
- Masonic Cancer Center, University of Minnesota, 425 E. River Pkwy, Minneapolis, MN 55455, USA
| |
Collapse
|
365
|
Luo J, Chen R, Zhang X, Wang Y, Luo H, Yan C, Huo Z. LROD: An Overlap Detection Algorithm for Long Reads Based on k-mer Distribution. Front Genet 2020; 11:632. [PMID: 32849762 PMCID: PMC7403501 DOI: 10.3389/fgene.2020.00632] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 05/26/2020] [Indexed: 11/13/2022] Open
Abstract
Third-generation sequencing technologies can produce large numbers of long reads, which have been widely used in many fields. When using long reads for genome assembly, overlap detection between any pair of long reads is an important step. However, the sequencing error rate of third-generation sequencing technologies is very high, and obtaining accurate overlap detection results is still a challenging task. In this study, we present a long-read overlap detection (LROD) algorithm that can improve the accuracy of overlap detection results. To detect overlaps between two long reads, LROD first retains only the solid common k-mers between them. These k-mers can simplify the process of overlap detection. Second, LROD finds a chain (i.e., candidate overlap) that includes the consistent common k-mers. In this step, LROD proposes a two-stage strategy to evaluate whether two common k-mers are consistent. Finally, LROD uses a novel strategy to determine whether the candidate overlaps are true and to revise them. To verify the performance of LROD, three simulated and three real long-read datasets are used in the experiments. Compared with two other popular methods (MHAP and Minimap2), LROD can achieve good performance in terms of the F1-score, precision and recall. LROD is available from https://github.com/luojunwei/LROD.
Collapse
Affiliation(s)
- Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Ranran Chen
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Xiaohong Zhang
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Yan Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Zhanqiang Huo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
366
|
Abstract
Diversity within the fungal kingdom is evident from the wide range of morphologies fungi display as well as the various ecological roles and industrial purposes they serve. Technological advances, particularly in long-read sequencing, coupled with the increasing efficiency and decreasing costs across sequencing platforms have enabled robust characterization of fungal genomes. These sequencing efforts continue to reveal the rampant diversity in fungi at the genome level. Here, we discuss studies that have furthered our understanding of fungal genetic diversity and genomic evolution. These studies revealed the presence of both small-scale and large-scale genomic changes. In fungi, research has recently focused on many small-scale changes, such as how hypermutation and allelic transmission impact genome evolution as well as how and why a few specific genomic regions are more susceptible to rapid evolution than others. High-throughput sequencing of a diverse set of fungal genomes has also illuminated the frequency, mechanisms, and impacts of large-scale changes, which include chromosome structural variation and changes in chromosome number, such as aneuploidy, polyploidy, and the presence of supernumerary chromosomes. The studies discussed herein have provided great insight into how the architecture of the fungal genome varies within species and across the kingdom and how modern fungi may have evolved from the last common fungal ancestor and might also pave the way for understanding how genomic diversity has evolved in all domains of life.
Collapse
Affiliation(s)
- Shelby J. Priest
- Department of Molecular Genetics and Microbiology, Duke University Medical Centre, Durham, NC, USA
| | - Vikas Yadav
- Department of Molecular Genetics and Microbiology, Duke University Medical Centre, Durham, NC, USA
| | - Joseph Heitman
- Department of Molecular Genetics and Microbiology, Duke University Medical Centre, Durham, NC, USA
| |
Collapse
|
367
|
Kutzner A, Kim PS, Schmidt M. A performant bridge between fixed-size and variable-size seeding. BMC Bioinformatics 2020; 21:328. [PMID: 32703211 PMCID: PMC7376731 DOI: 10.1186/s12859-020-03642-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/02/2020] [Indexed: 11/16/2022] Open
Abstract
Background Seeding is usually the initial step of high-throughput sequence aligners. Two popular seeding strategies are fixed-size seeding (k-mers, minimizers) and variable-size seeding (MEMs, SMEMs, maximal spanning seeds). The former strategy supports fast seed computation, while the latter one benefits from a high seed uniqueness. Algorithmic bridges between instances of both seeding strategies are of interest for combining their respective advantages. Results We introduce an efficient strategy for computing MEMs out of fixed-size seeds (k-mers or minimizers). In contrast to previously proposed extend-purge strategies, our merge-extend strategy prevents the creation and filtering of duplicate MEMs. Further, we describe techniques for extracting SMEMs or maximal spanning seeds out of MEMs. A comprehensive benchmarking shows the applicability, strengths, shortcomings and computational requirements of all discussed seeding techniques. Additionally, we report the effects of seed occurrence filters in the context of these techniques. Aside from our novel algorithmic approaches, we analyze hierarchies within fixed-size and variable-size seeding along with a mapping between instances of both seeding strategies. Conclusion Benchmarking shows that our proposed merge-extend strategy for MEM computation outperforms previous extend-purge strategies in the context of PacBio reads. The observed superiority grows with increasing read size and read quality. Further, the presented filters for extracting SMEMs or maximal spanning seeds out of MEMs outperform FMD-index based extension techniques. All code used for benchmarking is available via GitHub at https://github.com/ITBE-Lab/seed-evaluation.
Collapse
Affiliation(s)
- Arne Kutzner
- Department of Information Systems, College of Engineering, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Pok-Son Kim
- Department of Information Security, Cryptology, and Mathematics, Kookmin University, 77, Jeongneung-ro, Seongbuk-gu, Seoul, 02707, Republic of Korea
| | - Markus Schmidt
- Department of Information Systems, College of Engineering, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea.
| |
Collapse
|
368
|
Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, Levy Y, Harel TH, Shalev-Schlosser G, Amsellem Z, Razifard H, Caicedo AL, Tieman DM, Klee H, Kirsche M, Aganezov S, Ranallo-Benavidez TR, Lemmon ZH, Kim J, Robitaille G, Kramer M, Goodwin S, McCombie WR, Hutton S, Van Eck J, Gillis J, Eshed Y, Sedlazeck FJ, van der Knaap E, Schatz MC, Lippman ZB. Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell 2020; 182:145-161.e23. [PMID: 32553272 PMCID: PMC7354227 DOI: 10.1016/j.cell.2020.05.021] [Citation(s) in RCA: 444] [Impact Index Per Article: 88.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 04/10/2020] [Accepted: 05/12/2020] [Indexed: 12/22/2022]
Abstract
Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.
Collapse
Affiliation(s)
- Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Xingang Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Matthias Benoit
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sebastian Soyk
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Lara Pereira
- Center for Applied Genetic Technologies, Genetics & Genomics, University of Georgia, Athens, GA 30602, USA
| | - Lei Zhang
- Center for Applied Genetic Technologies, Genetics & Genomics, University of Georgia, Athens, GA 30602, USA
| | - Hamsini Suresh
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Florian Maumus
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - Danielle Ciren
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yuval Levy
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Tom Hai Harel
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Gili Shalev-Schlosser
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ziva Amsellem
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hamid Razifard
- Institute for Applied Life Sciences, University of Massachusetts Amherst, Amherst, MA 01003, USA; Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Ana L Caicedo
- Institute for Applied Life Sciences, University of Massachusetts Amherst, Amherst, MA 01003, USA; Department of Biology, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Denise M Tieman
- Horticultural Sciences, Plant Innovation Center, University of Florida, Gainesville, FL 32611, USA
| | - Harry Klee
- Horticultural Sciences, Plant Innovation Center, University of Florida, Gainesville, FL 32611, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | | | - Zachary H Lemmon
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Jennifer Kim
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Gina Robitaille
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Melissa Kramer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - W Richard McCombie
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Samuel Hutton
- Gulf Coast Research and Education Center, University of Florida, Wimauma, FL 33598, USA
| | - Joyce Van Eck
- Boyce Thompson Institute, Ithaca, NY 14853, USA; Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yuval Eshed
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Esther van der Knaap
- Center for Applied Genetic Technologies, Genetics & Genomics, University of Georgia, Athens, GA 30602, USA; Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Athens, GA 30602, USA; Department of Horticulture, University of Georgia, Athens, GA 30602, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA.
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
369
|
Weissensteiner MH, Bunikis I, Catalán A, Francoijs KJ, Knief U, Heim W, Peona V, Pophaly SD, Sedlazeck FJ, Suh A, Warmuth VM, Wolf JBW. Discovery and population genomics of structural variation in a songbird genus. Nat Commun 2020; 11:3403. [PMID: 32636372 PMCID: PMC7341801 DOI: 10.1038/s41467-020-17195-4] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 06/16/2020] [Indexed: 02/07/2023] Open
Abstract
Structural variation (SV) constitutes an important type of genetic mutations providing the raw material for evolution. Here, we uncover the genome-wide spectrum of intra- and interspecific SV segregating in natural populations of seven songbird species in the genus Corvus. Combining short-read (N = 127) and long-read re-sequencing (N = 31), as well as optical mapping (N = 16), we apply both assembly- and read mapping approaches to detect SV and characterize a total of 220,452 insertions, deletions and inversions. We exploit sampling across wide phylogenetic timescales to validate SV genotypes and assess the contribution of SV to evolutionary processes in an avian model of incipient speciation. We reveal an evolutionary young (~530,000 years) cis-acting 2.25-kb LTR retrotransposon insertion reducing expression of the NDP gene with consequences for premating isolation. Our results attest to the wealth and evolutionary significance of SV segregating in natural populations and highlight the need for reliable SV genotyping.
Collapse
Affiliation(s)
- Matthias H Weissensteiner
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden.
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.
- Department of Biology, Pennsylvania State University, 310 Wartik Lab, University Park, PA, 16802, USA.
| | - Ignas Bunikis
- Uppsala Genome Center, Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, BMC, Box 815, 752 37, Uppsala, Sweden
| | - Ana Catalán
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | | | - Ulrich Knief
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Wieland Heim
- Institute of Landscsape Ecology, University of Münster, Heisenbergstrasse 2, 48149, Münster, Germany
| | - Valentina Peona
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Uppsala University, 752 36, Uppsala, Sweden
| | - Saurabh D Pophaly
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center at Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Alexander Suh
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Uppsala University, 752 36, Uppsala, Sweden
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TU, UK
| | - Vera M Warmuth
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany
| | - Jochen B W Wolf
- Department of Evolutionary Biology and Science for Life Laboratory, Uppsala University, 752 36, Uppsala, Sweden.
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Grosshaderner Str. 2, 82152, Planegg-Martinsried, Germany.
| |
Collapse
|
370
|
Sapoval N, Mahmoud M, Jochum MD, Liu Y, Elworth RAL, Wang Q, Albin D, Ogilvie H, Lee MD, Villapol S, Hernandez KM, Berry IM, Foox J, Beheshti A, Ternus K, Aagaard KM, Posada D, Mason CE, Sedlazeck F, Treangen TJ. Hidden genomic diversity of SARS-CoV-2: implications for qRT-PCR diagnostics and transmission. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.07.02.184481. [PMID: 32637955 PMCID: PMC7337385 DOI: 10.1101/2020.07.02.184481] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The COVID-19 pandemic has sparked an urgent need to uncover the underlying biology of this devastating disease. Though RNA viruses mutate more rapidly than DNA viruses, there are a relatively small number of single nucleotide polymorphisms (SNPs) that differentiate the main SARS-CoV-2 clades that have spread throughout the world. In this study, we investigated over 7,000 SARS-CoV-2 datasets to unveil both intrahost and interhost diversity. Our intrahost and interhost diversity analyses yielded three major observations. First, the mutational profile of SARS-CoV-2 highlights iSNV and SNP similarity, albeit with high variability in C>T changes. Second, iSNV and SNP patterns in SARS-CoV-2 are more similar to MERS-CoV than SARS-CoV-1. Third, a significant fraction of small indels fuel the genetic diversity of SARS-CoV-2. Altogether, our findings provide insight into SARS-CoV-2 genomic diversity, inform the design of detection tests, and highlight the potential of iSNVs for tracking the transmission of SARS-CoV-2.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
| | - Michael D. Jochum
- Baylor College of Medicine and Texas Children’s Hospital, Houston, TX
| | - Yunxi Liu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX
| | - Dreycey Albin
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX
| | - Huw Ogilvie
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Michael D. Lee
- Exobiology Branch, NASA Ames Research Center, Mountain View, CA
- Blue Marble Space Institute of Science, Seattle, WA
| | | | - Kyle M. Hernandez
- Department of Medicine, University of Chicago, Chicago, IL
- Center for Translational Data Science, University of Chicago, Chicago, IL
| | | | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - Afshin Beheshti
- KBR, Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA
| | - Krista Ternus
- Signature Science, LLC, 8329 North Mopac Expressway, Austin TX 78759
| | | | - David Posada
- Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology School of Biology, University of Vigo, Vigo, Spain
- Galicia Sur Health Research Institute, 36310 Vigo, Spain
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - Fritz Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
371
|
A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol 2020; 38:1347-1355. [PMID: 32541955 PMCID: PMC8454654 DOI: 10.1038/s41587-020-0538-8] [Citation(s) in RCA: 227] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 04/28/2020] [Indexed: 12/19/2022]
Abstract
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed the first sequence-resolved benchmark set for identification of both false negative and false positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12745 isolated, sequence-resolved insertion (7281) and deletion (5464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5262 insertions and 4095 deletions supported by ≥1 diploid assembly. We demonstrate the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.
Collapse
|
372
|
Li X, Yang J, Shen M, Xie XL, Liu GJ, Xu YX, Lv FH, Yang H, Yang YL, Liu CB, Zhou P, Wan PC, Zhang YS, Gao L, Yang JQ, Pi WH, Ren YL, Shen ZQ, Wang F, Deng J, Xu SS, Salehian-Dehkordi H, Hehua E, Esmailizadeh A, Dehghani-Qanatqestani M, Štěpánek O, Weimann C, Erhardt G, Amane A, Mwacharo JM, Han JL, Hanotte O, Lenstra JA, Kantanen J, Coltman DW, Kijas JW, Bruford MW, Periasamy K, Wang XH, Li MH. Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits. Nat Commun 2020; 11:2815. [PMID: 32499537 PMCID: PMC7272655 DOI: 10.1038/s41467-020-16485-1] [Citation(s) in RCA: 176] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/04/2020] [Indexed: 01/15/2023] Open
Abstract
Understanding the genetic changes underlying phenotypic variation in sheep (Ovis aries) may facilitate our efforts towards further improvement. Here, we report the deep resequencing of 248 sheep including the wild ancestor (O. orientalis), landraces, and improved breeds. We explored the sheep variome and selection signatures. We detected genomic regions harboring genes associated with distinct morphological and agronomic traits, which may be past and potential future targets of domestication, breeding, and selection. Furthermore, we found non-synonymous mutations in a set of plausible candidate genes and significant differences in their allele frequency distributions across breeds. We identified PDGFD as a likely causal gene for fat deposition in the tails of sheep through transcriptome, RT-PCR, qPCR, and Western blot analyses. Our results provide insights into the demographic history of sheep and a valuable genomic resource for future genetic studies and improved genome-assisted breeding of sheep and other domestic animals.
Collapse
Affiliation(s)
- Xin Li
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Ji Yang
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Min Shen
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Xing-Long Xie
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Guang-Jian Liu
- Novogene Bioinformatics Institute, Beijing, 100083, China
| | - Ya-Xi Xu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Feng-Hua Lv
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Hua Yang
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Yong-Lin Yang
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Chang-Bin Liu
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Ping Zhou
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Peng-Cheng Wan
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Yun-Sheng Zhang
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Lei Gao
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Jing-Quan Yang
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Wen-Hui Pi
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China
| | - Yan-Ling Ren
- Shandong Binzhou Academy of Animal Science and Veterinary Medicine, Binzhou, 256600, China
| | - Zhi-Qiang Shen
- Shandong Binzhou Academy of Animal Science and Veterinary Medicine, Binzhou, 256600, China
| | - Feng Wang
- Institute of Sheep and Goat Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Juan Deng
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, 611130, China
| | - Song-Song Xu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Hosein Salehian-Dehkordi
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China
- University of Chinese Academy of Sciences (UCAS), Beijing, 100049, China
| | - Eer Hehua
- Grass-Feeding Livestock Engineering Technology Research Center, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan, China
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran
| | | | - Ondřej Štěpánek
- Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - Christina Weimann
- Institute of Animal Breeding and Genetics, Justus Liebig University, Giessen, Germany
| | - Georg Erhardt
- Institute of Animal Breeding and Genetics, Justus Liebig University, Giessen, Germany
| | - Agraw Amane
- Department of Microbial, Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
- LiveGene Program, International Livestock Research Institute, Addis Ababa, Ethiopia
| | - Joram M Mwacharo
- Small Ruminant Genomics, International Centre for Agricultural Research in the Dry Areas (ICARDA), Addis Ababa, Ethiopia
| | - Jian-Lin Han
- CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China
- Livestock Genetics Program, International Livestock Research Institute (ILRI), Nairobi, Kenya
| | - Olivier Hanotte
- LiveGene Program, International Livestock Research Institute, Addis Ababa, Ethiopia
- School of Life Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
- Center for Tropical Livestock Genetics and Health (CTLGH), the Roslin Institute, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, Scotland, UK
| | - Johannes A Lenstra
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Juha Kantanen
- Production Systems, Natural Resources Institute Finland (Luke), FI-31600, Jokioinen, Finland
| | - David W Coltman
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada
| | - James W Kijas
- CSIRO Livestock Industries, St Lucia, Brisbane, QLD, Australia
| | - Michael W Bruford
- School of Biosciences, Cardiff University, Cathays Park, Cardiff, CF10 3AX, Wales, UK
- Sustainable Places Research Institute, Cardiff University, CF10 3BA, Cardiff, Wales, UK
| | - Kathiravan Periasamy
- Animal Production and Health Laboratory, Joint FAO/IAEA Division of Nuclear Techniques in Food and Agriculture, International Atomic Energy Agency, Vienna, Austria
| | - Xin-Hua Wang
- Institute of Animal Husbandry and Veterinary Medicine, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China.
- State Key Laboratory of Sheep Genetic Improvement and Healthy Breeding, Xinjiang Academy of Agricultural and Reclamation Sciences, Shihezi, 832000, China.
| | - Meng-Hua Li
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences (CAS), Beijing, 100101, China.
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
373
|
Analyses of breakpoint junctions of complex genomic rearrangements comprising multiple consecutive microdeletions by nanopore sequencing. J Hum Genet 2020; 65:735-741. [PMID: 32355308 DOI: 10.1038/s10038-020-0762-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 04/01/2020] [Accepted: 04/04/2020] [Indexed: 12/19/2022]
Abstract
The widespread use of genomic copy number analysis has revealed many previously unknown genomic structural variations, including some which are more complex. In this study, three consecutive microdeletions were identified in the same chromosome by microarray-based comparative genomic hybridization (aCGH) analysis for a patient with a neurodevelopmental disorder. Subsequent fluorescence in situ hybridization (FISH) analyses unexpectedly suggested complicated translocations and inversions. For better understanding of the mechanism, breakpoint junctions were analyzed by nanopore sequencing, as a new long-read whole-genome sequencing (WGS) tool. The results revealed a new chromosomal disruption, giving rise to four junctions in chromosome 7. According the sequencing results of breakpoint junctions, all junctions were considered as the consequence of multiple double-strand breaks and the reassembly of DNA fragments by nonhomologous end-joining, indicating chromothripsis. KMT2E, located within the deletion region, was considered as the gene responsible for the clinical features of the patient. Combinatory usage of aCGH and FISH analyses would be recommended for interpretation of structural variations analyzed through WGS.
Collapse
|
374
|
Butler DJ, Mozsary C, Meydan C, Danko D, Foox J, Rosiene J, Shaiber A, Afshinnekoo E, MacKay M, Sedlazeck FJ, Ivanov NA, Sierra M, Pohle D, Zietz M, Gisladottir U, Ramlall V, Westover CD, Ryon K, Young B, Bhattacharya C, Ruggiero P, Langhorst BW, Tanner N, Gawrys J, Meleshko D, Xu D, Steel PAD, Shemesh AJ, Xiang J, Thierry-Mieg J, Thierry-Mieg D, Schwartz RE, Iftner A, Bezdan D, Sipley J, Cong L, Craney A, Velu P, Melnick AM, Hajirasouliha I, Horner SM, Iftner T, Salvatore M, Loda M, Westblade LF, Cushing M, Levy S, Wu S, Tatonetti N, Imielinski M, Rennert H, Mason CE. Shotgun Transcriptome and Isothermal Profiling of SARS-CoV-2 Infection Reveals Unique Host Responses, Viral Diversification, and Drug Interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.04.20.048066. [PMID: 32511352 PMCID: PMC7255793 DOI: 10.1101/2020.04.20.048066] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has caused thousands of deaths worldwide, including >18,000 in New York City (NYC) alone. The sudden emergence of this pandemic has highlighted a pressing clinical need for rapid, scalable diagnostics that can detect infection, interrogate strain evolution, and identify novel patient biomarkers. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs, plus a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, bacterial, and viral profiling. We applied both technologies across 857 SARS-CoV-2 clinical specimens and 86 NYC subway samples, providing a broad molecular portrait of the COVID-19 NYC outbreak. Our results define new features of SARS-CoV-2 evolution, nominate a novel, NYC-enriched viral subclade, reveal specific host responses in interferon, ACE, hematological, and olfaction pathways, and examine risks associated with use of ACE inhibitors and angiotensin receptor blockers. Together, these findings have immediate applications to SARS-CoV-2 diagnostics, public health, and new therapeutic targets.
Collapse
Affiliation(s)
- Daniel J. Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | | | - Cem Meydan
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, NY, USA
| | - David Danko
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- Tri-Institutional Computational Biol. & Medicine Program, Weill Cornell Medicine, NY, USA
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
| | - Joel Rosiene
- New York Genome Center, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Alon Shaiber
- New York Genome Center, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, NY, USA
| | - Ebrahim Afshinnekoo
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, NY, USA
| | - Matthew MacKay
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Nikolay A. Ivanov
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- Clinical & Translational Science Center, Weill Cornell Medicine, NY, USA
| | - Maria Sierra
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Diana Pohle
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - Michael Zietz
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
| | - Undina Gisladottir
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
| | - Vijendra Ramlall
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
- Department of Cellular, Molecular Physiology & Biophysics, Columbia University, NY, USA
| | - Craig D. Westover
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Krista Ryon
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | - Benjamin Young
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
| | | | - Phyllis Ruggiero
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | | | | | - Justyna Gawrys
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Dmitry Meleshko
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- Tri-Institutional Computational Biol. & Medicine Program, Weill Cornell Medicine, NY, USA
| | - Dong Xu
- Genomics Resources Core Facility, Weill Cornell Medicine, NY, USA
| | | | - Amos J. Shemesh
- Department of Emergency Medicine, Weill Cornell Medicine, NY, USA
| | - Jenny Xiang
- Genomics Resources Core Facility, Weill Cornell Medicine, NY, USA
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, NY, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, MD, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, MD, USA
| | | | - Angelika Iftner
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - Daniela Bezdan
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - John Sipley
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Lin Cong
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Arryn Craney
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Priya Velu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | | | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, NY, USA
| | - Stacy M. Horner
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, NC, USA
- Department of Medicine, Duke University Medical Center, NC, USA
| | - Thomas Iftner
- Institute of Medical Virology and Epidemiology of Viral Diseases, University Hospital Tuebingen, Germany
| | - Mirella Salvatore
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, NY, USA
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Lars F. Westblade
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, NY, USA
| | - Melissa Cushing
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Shawn Levy
- HudsonAlpha Discovery Institute, Huntsville, AL, USA
| | - Shixiu Wu
- Hangzhou Cancer Institute, Hangzhou Cancer Hospital, Hangzhou, China
- Department of Radiation Oncology, Hangzhou Cancer Hospital, Hangzhou, China
| | - Nicholas Tatonetti
- Department of Biomedical Informatics, Department of Systems Biology, Department of Medicine, Institute for Genomic Medicine, Columbia University, NY, USA
| | - Marcin Imielinski
- New York Genome Center, NY, USA
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
- Englander Institute for Precision Medicine and the Meyer Cancer Center, Weill Cornell Medicine, NY, USA
| | - Hanna Rennert
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, NY, USA
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, NY, USA
- WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, NY, USA
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, NY, USA
| |
Collapse
|
375
|
Adaptation to Industrial Stressors Through Genomic and Transcriptional Plasticity in a Bioethanol Producing Fission Yeast Isolate. G3-GENES GENOMES GENETICS 2020; 10:1375-1391. [PMID: 32086247 PMCID: PMC7144085 DOI: 10.1534/g3.119.400986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Schizosaccharomyces pombe is a model unicellular eukaryote with ties to the basic research, oenology and industrial biotechnology sectors. While most investigations into S. pombe cell biology utilize Leupold’s 972h- laboratory strain background, recent studies have described a wealth of genetic and phenotypic diversity within wild populations of S. pombe including stress resistance phenotypes which may be of interest to industry. Here we describe the genomic and transcriptomic characterization of Wilmar-P, an S. pombe isolate used for bioethanol production from sugarcane molasses at industrial scale. Novel sequences present in Wilmar-P but not in the laboratory S. pombe genome included multiple coding sequences with near-perfect nucleotide identity to Schizosaccharomyces octosporus sequences. Wilmar-P also contained a ∼100kb duplication in the right arm of chromosome III, a region harboring ght5+, the predominant hexose transporter encoding gene. Transcriptomic analysis of Wilmar-P grown in molasses revealed strong downregulation of core environmental stress response genes and upregulation of hexose transporters and drug efflux pumps compared to laboratory S. pombe. Finally, examination of the regulatory network of Scr1, which is involved in the regulation of several genes differentially expressed on molasses, revealed expanded binding of this transcription factor in Wilmar-P compared to laboratory S. pombe in the molasses condition. Together our results point to both genomic plasticity and transcriptomic adaptation as mechanisms driving phenotypic adaptation of Wilmar-P to the molasses environment and therefore adds to our understanding of genetic diversity within industrial fission yeast strains and the capacity of this strain for commercial scale bioethanol production.
Collapse
|
376
|
Gilpatrick T, Lee I, Graham JE, Raimondeau E, Bowen R, Heron A, Downs B, Sukumar S, Sedlazeck FJ, Timp W. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol 2020; 38:433-438. [PMID: 32042167 PMCID: PMC7145730 DOI: 10.1038/s41587-020-0407-5] [Citation(s) in RCA: 253] [Impact Index Per Article: 50.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 01/06/2020] [Indexed: 12/21/2022]
Abstract
Despite recent improvements in sequencing methods, there remains a need for assays that provide high sequencing depth and comprehensive variant detection. Current methods1-4 are limited by the loss of native modifications, short read length, high input requirements, low yield or long protocols. In the present study, we describe nanopore Cas9-targeted sequencing (nCATS), an enrichment strategy that uses targeted cleavage of chromosomal DNA with Cas9 to ligate adapters for nanopore sequencing. We show that nCATS can simultaneously assess haplotype-resolved single-nucleotide variants, structural variations and CpG methylation. We apply nCATS to four cell lines, to a cell-line-derived xenograft, and to normal and paired tumor/normal primary human breast tissue. Median sequencing coverage was 675× using a MinION flow cell and 34× using the smaller Flongle flow cell. The nCATS sequencing requires only ~3 μg of genomic DNA and can target a large number of loci in a single reaction. The method will facilitate the use of long-read sequencing in research and in the clinic.
Collapse
Affiliation(s)
- Timothy Gilpatrick
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Isac Lee
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | - Bradley Downs
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Saraswati Sukumar
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Molecular Biology and Genetics, Department of Medicine, Division of Infectious Disease, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
377
|
Chander V, Gibbs RA, Sedlazeck FJ. Evaluation of computational genotyping of structural variation for clinical diagnoses. Gigascience 2020; 8:5565134. [PMID: 31494671 PMCID: PMC6732172 DOI: 10.1093/gigascience/giz110] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 06/27/2019] [Accepted: 08/13/2019] [Indexed: 01/08/2023] Open
Abstract
Background Structural variation (SV) plays a pivotal role in genetic disease. The discovery of SVs based on short DNA sequence reads from next-generation DNA sequence methods is error-prone, with low sensitivity and high false discovery rates. These shortcomings can be partially overcome with extensive orthogonal validation methods or use of long reads, but the current cost precludes their application for routine clinical diagnostics. In contrast, SV genotyping of known sites of SV occurrence is relatively robust and therefore offers a cost-effective clinical diagnostic tool with potentially few false-positive and false-negative results, even when applied to short-read DNA sequence data. Results We assess 5 state-of-the-art SV genotyping software methods, applied to short-read sequence data. The methods are characterized on the basis of their ability to genotype different SV types, spanning different size ranges. Furthermore, we analyze their ability to parse different VCF file subformats and assess their reliance on specific metadata. We compare the SV genotyping methods across a range of simulated and real data including SVs that were not found with Illumina data alone. We assess sensitivity and the ability to filter initial false discovery calls. We determined the impact of SV type and size on the performance for each SV genotyper. Overall, STIX performed the best on both simulated and GiaB based SV calls, demonstrating a good balance between sensitivity and specificty. Conclusion Our results indicate that, although SV genotyping software methods have superior performance to SV callers, there are limitations that suggest the need for further innovation.
Collapse
Affiliation(s)
- Varuna Chander
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX 77030, USA
| |
Collapse
|
378
|
Luan MW, Zhang XM, Zhu ZB, Chen Y, Xie SQ. Evaluating Structural Variation Detection Tools for Long-Read Sequencing Datasets in Saccharomyces cerevisiae. Front Genet 2020; 11:159. [PMID: 32211024 PMCID: PMC7075250 DOI: 10.3389/fgene.2020.00159] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 02/11/2020] [Indexed: 01/01/2023] Open
Abstract
Structural variation (SV) represents a major form of genetic variations that contribute to polymorphic variations, human diseases, and phenotypes in many organisms. Long-read sequencing has been successfully used to identify novel and complex SVs. However, comparison of SV detection tools for long-read sequencing datasets has not been reported. Therefore, we developed an analysis workflow that combined two alignment tools (NGMLR and minimap2) and five callers (Sniffles, Picky, smartie-sv, PBHoney, and NanoSV) to evaluate the SV detection in six datasets of Saccharomyces cerevisiae. The accuracy of SV regions was validated by re-aligning raw reads in diverse alignment tools, SV callers, experimental conditions, and sequencing platforms. The results showed that SV detection between NGMLR and minimap2 was not significant when using the same caller. The PBHoney was with the highest average accuracy (89.04%) and Picky has the lowest average accuracy (35.85%). The accuracy of NanoSV, Sniffles, and smartie-sv was 68.67%, 60.47%, and 57.67%, respectively. In addition, smartie-sv and NanoSV detected the most and least number of SVs, and SV detection from the PacBio sequencing platform was significantly more than that from ONT (p = 0.000173).
Collapse
Affiliation(s)
- Mei-Wei Luan
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Xiao-Ming Zhang
- College of Grassland, Resources and Environment, Inner Mongolia Agricultural University, Huhhot, China
| | - Zi-Bin Zhu
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| | - Ying Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Shang-Qian Xie
- Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, College of Forestry, Hainan University, Haikou, China
| |
Collapse
|
379
|
Tao YT, Suo F, Tusso S, Wang YK, Huang S, Wolf JBW, Du LL. Intraspecific Diversity of Fission Yeast Mitochondrial Genomes. Genome Biol Evol 2020; 11:2312-2329. [PMID: 31364709 PMCID: PMC6736045 DOI: 10.1093/gbe/evz165] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2019] [Indexed: 02/07/2023] Open
Abstract
The fission yeast Schizosaccharomyces pombe is an important model organism, but its natural diversity and evolutionary history remain under-studied. In particular, the population genomics of the S. pombe mitochondrial genome (mitogenome) has not been thoroughly investigated. Here, we assembled the complete circular-mapping mitogenomes of 192 S. pombe isolates de novo, and found that these mitogenomes belong to 69 nonidentical sequence types ranging from 17,618 to 26,910 bp in length. Using the assembled mitogenomes, we identified 20 errors in the reference mitogenome and discovered two previously unknown mitochondrial introns. Analyzing sequence diversity of these 69 types of mitogenomes revealed two highly distinct clades, with only three mitogenomes exhibiting signs of inter-clade recombination. This diversity pattern suggests that currently available S. pombe isolates descend from two long-separated ancestral lineages. This conclusion is corroborated by the diversity pattern of the recombination-repressed K-region located between donor mating-type loci mat2 and mat3 in the nuclear genome. We estimated that the two ancestral S. pombe lineages diverged about 31 million generations ago. These findings shed new light on the evolution of S. pombe and the data sets generated in this study will facilitate future research on genome evolution.
Collapse
Affiliation(s)
- Yu-Tian Tao
- National Institute of Biological Sciences, Beijing, China.,Graduate School of Peking Union Medical College, Beijing, China
| | - Fang Suo
- National Institute of Biological Sciences, Beijing, China
| | - Sergio Tusso
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany.,Science for Life Laboratories, Department of Evolutionary Biology, Uppsala University, Sweden
| | - Yan-Kai Wang
- National Institute of Biological Sciences, Beijing, China
| | - Song Huang
- National Institute of Biological Sciences, Beijing, China.,Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China
| | - Jochen B W Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany.,Science for Life Laboratories, Department of Evolutionary Biology, Uppsala University, Sweden
| | - Li-Lin Du
- National Institute of Biological Sciences, Beijing, China.,Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing, China
| |
Collapse
|
380
|
De Coster W, Strazisar M, De Rijk P. Critical length in long-read resequencing. NAR Genom Bioinform 2020; 2:lqz027. [PMID: 33575574 PMCID: PMC7671308 DOI: 10.1093/nargab/lqz027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 12/06/2019] [Accepted: 01/02/2020] [Indexed: 12/25/2022] Open
Abstract
Long-read sequencing has substantial advantages for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used long reads simulated from human genomes and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequencing projects.
Collapse
Affiliation(s)
- Wouter De Coster
- VIB-UAntwerp Center for Molecular Neurology, 2610 Antwerp, Belgium
| | - Mojca Strazisar
- VIB-UAntwerp Center for Molecular Neurology, 2610 Antwerp, Belgium
| | - Peter De Rijk
- VIB-UAntwerp Center for Molecular Neurology, 2610 Antwerp, Belgium
| |
Collapse
|
381
|
Balachandran P, Beck CR. Structural variant identification and characterization. Chromosome Res 2020; 28:31-47. [PMID: 31907725 PMCID: PMC7131885 DOI: 10.1007/s10577-019-09623-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 10/15/2019] [Accepted: 11/24/2019] [Indexed: 01/06/2023]
Abstract
Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.
Collapse
Affiliation(s)
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
382
|
Abstract
Identifying structural variation (SV) is essential for genome interpretation but has been historically difficult due to limitations inherent to available genome technologies. Detection methods that use ensemble algorithms and emerging sequencing technologies have enabled the discovery of thousands of SVs, uncovering information about their ubiquity, relationship to disease and possible effects on biological mechanisms. Given the variability in SV type and size, along with unique detection biases of emerging genomic platforms, multiplatform discovery is necessary to resolve the full spectrum of variation. Here, we review modern approaches for investigating SVs and proffer that, moving forwards, studies integrating biological information with detection will be necessary to comprehensively understand the impact of SV in the human genome.
Collapse
Affiliation(s)
- Steve S Ho
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Ryan E Mills
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
383
|
Pollo SMJ, Reiling SJ, Wit J, Workentine ML, Guy RA, Batoff GW, Yee J, Dixon BR, Wasmuth JD. Benchmarking hybrid assemblies of Giardia and prediction of widespread intra-isolate structural variation. Parasit Vectors 2020; 13:108. [PMID: 32111234 PMCID: PMC7048089 DOI: 10.1186/s13071-020-3968-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 02/13/2020] [Indexed: 01/02/2023] Open
Abstract
Background Currently available short read genome assemblies of the tetraploid protozoan parasite Giardia intestinalis are highly fragmented, highlighting the need for improved genome assemblies at a reasonable cost. Long nanopore reads are well suited to resolve repetitive genomic regions resulting in better quality assemblies of eukaryotic genomes. Subsequent addition of highly accurate short reads to long-read assemblies further improves assembly quality. Using this hybrid approach, we assembled genomes for three Giardia isolates, two with published assemblies and one novel, to evaluate the improvement in genome quality gained from long reads. We then used the long reads to predict structural variants to examine this previously unexplored source of genetic variation in Giardia. Methods With MinION reads for each isolate, we assembled genomes using several assemblers specializing in long reads. Assembly metrics, gene finding, and whole genome alignments to the reference genomes enabled direct comparison to evaluate the performance of the nanopore reads. Further improvements from adding Illumina reads to the long-read assemblies were evaluated using gene finding. Structural variants were predicted from alignments of the long reads to the best hybrid genome for each isolate and enrichment of key genes was analyzed using random genome sampling and calculation of percentiles to find thresholds of significance. Results Our hybrid assembly method generated reference quality genomes for each isolate. Consistent with previous findings based on SNPs, examination of heterozygosity using the structural variants found that Giardia BGS was considerably more heterozygous than the other isolates that are from Assemblage A. Further, each isolate was shown to contain structural variant regions enriched for variant-specific surface proteins, a key class of virulence factor in Giardia. Conclusions The ability to generate reference quality genomes from a single MinION run and a multiplexed MiSeq run enables future large-scale comparative genomic studies within the genus Giardia. Further, prediction of structural variants from long reads allows for more in-depth analyses of major sources of genetic variation within and between Giardia isolates that could have effects on both pathogenicity and host range.![]()
Collapse
Affiliation(s)
- Stephen M J Pollo
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada.,Host-Parasite Interactions Training Program, University of Calgary, Calgary, AB, Canada
| | - Sarah J Reiling
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - Janneke Wit
- Host-Parasite Interactions Training Program, University of Calgary, Calgary, AB, Canada.,Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - Matthew L Workentine
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - Rebecca A Guy
- Division of Enteric Diseases, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada
| | - G William Batoff
- Department of Biology, Biochemistry and Molecular Biology Program, Trent University, Peterborough, ON, Canada
| | - Janet Yee
- Department of Biology, Biochemistry and Molecular Biology Program, Trent University, Peterborough, ON, Canada
| | - Brent R Dixon
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - James D Wasmuth
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada. .,Host-Parasite Interactions Training Program, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
384
|
Liu Y, Zhang M, Sun J, Chang W, Sun M, Zhang S, Wu J. Comparison of multiple algorithms to reliably detect structural variants in pears. BMC Genomics 2020; 21:61. [PMID: 31959124 PMCID: PMC6972009 DOI: 10.1186/s12864-020-6455-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Accepted: 01/07/2020] [Indexed: 01/01/2023] Open
Abstract
Background Structural variations (SVs) have been reported to play an important role in genetic diversity and trait regulation. Many computer algorithms detecting SVs have recently been developed, but the use of multiple algorithms to detect high-confidence SVs has not been studied. The most suitable sequencing depth for detecting SVs in pear is also not known. Results In this study, a pipeline to detect SVs using next-generation and long-read sequencing data was constructed. The performances of seven types of SV detection software using next-generation sequencing (NGS) data and two types of software using long-read sequencing data (SVIM and Sniffles), which are based on different algorithms, were compared. Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (> 90%). When the results from multiple SV detection tools were combined, the SVs identified by both MetaSV and IMR/DENOM, which use NGS data, were more accurate than those identified by both SVIM and Sniffles, with mean accuracies of 98.7 and 96.5%, respectively. The software packages using long-read sequencing data required fewer CPU cores and less memory and ran faster than those using NGS data. In addition, according to the performances of assembly-based algorithms using NGS data, we found that a sequencing depth of 50× is appropriate for detecting SVs in the pear genome. Conclusion This study provides strong evidence that more than one SV detection software package, each based on a different algorithm, should be used to detect SVs with higher confidence, and that long-read sequencing data are better than NGS data for SV detection. The SV detection pipeline that we have established will facilitate the study of diversity in other crops.
Collapse
Affiliation(s)
- Yueyuan Liu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Mingyue Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jieying Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Wenjing Chang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Manyi Sun
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Shaoling Zhang
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China
| | - Jun Wu
- Center of Pear Engineering Technology Research, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
385
|
Kuzniar A, Maassen J, Verhoeven S, Santuari L, Shneider C, Kloosterman WP, de Ridder J. sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data. PeerJ 2020; 8:e8214. [PMID: 31934500 PMCID: PMC6951283 DOI: 10.7717/peerj.8214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 11/14/2019] [Indexed: 12/19/2022] Open
Abstract
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.
Collapse
Affiliation(s)
| | | | | | - Luca Santuari
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Carl Shneider
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Wigard P Kloosterman
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| |
Collapse
|
386
|
Goel M, Sun H, Jiao WB, Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 2019; 20:277. [PMID: 31842948 PMCID: PMC6913012 DOI: 10.1186/s13059-019-1911-0] [Citation(s) in RCA: 449] [Impact Index Per Article: 74.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 12/02/2019] [Indexed: 01/27/2023] Open
Abstract
Genomic differences range from single nucleotide differences to complex structural variations. Current methods typically annotate sequence differences ranging from SNPs to large indels accurately but do not unravel the full complexity of structural rearrangements, including inversions, translocations, and duplications, where highly similar sequence changes in location, orientation, or copy number. Here, we present SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions.
Collapse
Affiliation(s)
- Manish Goel
- Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Hequan Sun
- Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Wen-Biao Jiao
- Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Korbinian Schneeberger
- Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
- Faculty of Biology, LMU Munich, 82152 Planegg-Martinsried, Germany
| |
Collapse
|
387
|
Arora K, Shah M, Johnson M, Sanghvi R, Shelton J, Nagulapalli K, Oschwald DM, Zody MC, Germer S, Jobanputra V, Carter J, Robine N. Deep whole-genome sequencing of 3 cancer cell lines on 2 sequencing platforms. Sci Rep 2019; 9:19123. [PMID: 31836783 PMCID: PMC6911065 DOI: 10.1038/s41598-019-55636-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/30/2019] [Indexed: 12/15/2022] Open
Abstract
To test the performance of a new sequencing platform, develop an updated somatic calling pipeline and establish a reference for future benchmarking experiments, we performed whole-genome sequencing of 3 common cancer cell lines (COLO-829, HCC-1143 and HCC-1187) along with their matched normal cell lines to great sequencing depths (up to 278x coverage) on both Illumina HiSeqX and NovaSeq sequencing instruments. Somatic calling was generally consistent between the two platforms despite minor differences at the read level. We designed and implemented a novel pipeline for the analysis of tumor-normal samples, using multiple variant callers. We show that coupled with a high-confidence filtering strategy, the use of combination of tools improves the accuracy of somatic variant calling. We also demonstrate the utility of the dataset by creating an artificial purity ladder to evaluate the somatic pipeline and benchmark methods for estimating purity and ploidy from tumor-normal pairs. The data and results of the pipeline are made accessible to the cancer genomics community.
Collapse
Affiliation(s)
- Kanika Arora
- New York Genome Center, New York, NY, 10013, USA
| | - Minita Shah
- New York Genome Center, New York, NY, 10013, USA
| | | | | | | | | | | | | | - Soren Germer
- New York Genome Center, New York, NY, 10013, USA
| | | | - Jade Carter
- New York Genome Center, New York, NY, 10013, USA
| | | |
Collapse
|
388
|
Eggertsson HP, Kristmundsdottir S, Beyter D, Jonsson H, Skuladottir A, Hardarson MT, Gudbjartsson DF, Stefansson K, Halldorsson BV, Melsted P. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat Commun 2019; 10:5402. [PMID: 31776332 PMCID: PMC6881350 DOI: 10.1038/s41467-019-13341-9] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 10/30/2019] [Indexed: 12/31/2022] Open
Abstract
Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.
Collapse
Affiliation(s)
- Hannes P Eggertsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland.
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.
| | - Snaedis Kristmundsdottir
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
- School of Science and Engineering, Reykjavik University, Reykjavik, Iceland
| | - Doruk Beyter
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
| | - Hakon Jonsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
| | | | | | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland
- Faculty of Medicine, School of Health Sciences, University of Iceland, Reykjavik, Iceland
| | - Bjarni V Halldorsson
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland.
- School of Science and Engineering, Reykjavik University, Reykjavik, Iceland.
| | - Pall Melsted
- deCODE genetics/Amgen Inc., Sturlugata 8, Reykjavik, Iceland.
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.
| |
Collapse
|
389
|
Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol 2019; 20:246. [PMID: 31747936 PMCID: PMC6868818 DOI: 10.1186/s13059-019-1828-7] [Citation(s) in RCA: 378] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 09/19/2019] [Indexed: 02/08/2023] Open
Abstract
Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA
| | - Nastassia Gobet
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Diana Ivette Cruz-Dávalos
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Ninon Mounier
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Center for Primary Care and Public Health, Lausanne, Switzerland
| | - Christophe Dessimoz
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution & Environment, University College London, London, UK.
- Department of Computer Science, University College London, London, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA.
| |
Collapse
|
390
|
Wijfjes RY, Smit S, de Ridder D. Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data. BMC Genomics 2019; 20:818. [PMID: 31699036 PMCID: PMC6836508 DOI: 10.1186/s12864-019-6153-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 09/30/2019] [Indexed: 01/27/2023] Open
Abstract
Background Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. Results To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. Conclusions Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
Collapse
Affiliation(s)
- Raúl Y Wijfjes
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands.
| | - Sandra Smit
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
391
|
Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinformatics 2019; 20:548. [PMID: 31690272 PMCID: PMC6833150 DOI: 10.1186/s12859-019-3145-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/09/2019] [Indexed: 01/30/2023] Open
Abstract
Background Genome graph is an emerging approach for representing structural variants on genomes with branches. For example, representing structural variants of cancer genomes as a genome graph is more natural than representing such genomes as differences from the linear reference genome. While more and more structural variants are being identified by long-read sequencing, many of them are difficult to visualize using existing structural variants visualization tools. To this end, visualization method for large genome graphs such as human cancer genome graphs is demanded. Results We developed MOdular Multi-scale Integrated Genome graph browser, MoMI-G, a web-based genome graph browser that can visualize genome graphs with structural variants and supporting evidences such as read alignments, read depth, and annotations. This browser allows more intuitive recognition of large, nested, and potentially more complex structural variations. MoMI-G has view modules for different scales, which allow users to view the whole genome down to nucleotide-level alignments of long reads. Alignments spanning reference alleles and those spanning alternative alleles are shown in the same view. Users can customize the view, if they are not satisfied with the preset views. In addition, MoMI-G has Interval Card Deck, a feature for rapid manual inspection of hundreds of structural variants. Herein, we describe the utility of MoMI-G by using representative examples of large and nested structural variations found in two cell lines, LC-2/ad and CHM1. Conclusions Users can inspect complex and large structural variations found by long-read analysis in large genomes such as human genomes more smoothly and more intuitively. In addition, users can easily filter out false positives by manually inspecting hundreds of identified structural variants with supporting long-read alignments and annotations in a short time. Software availability MoMI-G is freely available at https://github.com/MoMI-G/MoMI-G under the MIT license.
Collapse
Affiliation(s)
- Toshiyuki T Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahiro Kasahara
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan.
| |
Collapse
|
392
|
Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, Lippman ZB, Schatz MC. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol 2019; 20:224. [PMID: 31661016 PMCID: PMC6816165 DOI: 10.1186/s13059-019-1829-6] [Citation(s) in RCA: 402] [Impact Index Per Article: 67.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 09/19/2019] [Indexed: 01/10/2023] Open
Abstract
We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO .
Collapse
Affiliation(s)
- Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sebastian Soyk
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Xingang Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Cold Spring Harbor Laboratory, Howard Hughes Medical Institute, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
393
|
Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, Töpfer A, Alonge M, Mahmoud M, Qian Y, Chin CS, Phillippy AM, Schatz MC, Myers G, DePristo MA, Ruan J, Marschall T, Sedlazeck FJ, Zook JM, Li H, Koren S, Carroll A, Rank DR, Hunkapiller MW. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 2019; 37:1155-1162. [PMID: 31406327 PMCID: PMC6776680 DOI: 10.1038/s41587-019-0217-9] [Citation(s) in RCA: 980] [Impact Index Per Article: 163.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 07/08/2019] [Indexed: 12/30/2022]
Abstract
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Jana Ebler
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | | | | | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Gene Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | | | - Jue Ruan
- Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Heng Li
- Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | | | | | | |
Collapse
|
394
|
Bravo Ruiz G, Ross ZK, Holmes E, Schelenz S, Gow NAR, Lorenz A. Rapid and extensive karyotype diversification in haploid clinical Candida auris isolates. Curr Genet 2019; 65:1217-1228. [PMID: 31020384 PMCID: PMC6744574 DOI: 10.1007/s00294-019-00976-w] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 04/09/2019] [Accepted: 04/15/2019] [Indexed: 11/30/2022]
Abstract
Candida auris is a newly emerged pathogenic microbe, having been identified as a medically relevant fungus as recently as 2009. It is one of the most drug-resistant yeast species known to date and its emergence and population structure are unusual. Because of its recent emergence, we are largely ignorant about fundamental aspects of its general biology, life cycle, and population dynamics. Here, we report the karyotype variability of 26 C. auris strains representing the four main clades. We demonstrate that all strains are haploid and have a highly plastic karyotype containing five to seven chromosomes, which can undergo marked alterations within a short time frame when the fungus is put under genotoxic, heat, or osmotic stress. No simple correlation was found between karyotype pattern, drug resistance, and clade affiliation indicating that karyotype heterogeneity is rapidly evolving. As with other Candida species, these marked karyotype differences between isolates are likely to have an important impact on pathogenic traits of C. auris.
Collapse
Affiliation(s)
- Gustavo Bravo Ruiz
- Institute of Medical Sciences (IMS), University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Zoe K Ross
- Institute of Medical Sciences (IMS), University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
- MRC Centre for Medical Mycology, University of Aberdeen, Aberdeen, UK
| | - Eilidh Holmes
- Institute of Medical Sciences (IMS), University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
| | - Silke Schelenz
- Department of Microbiology, Royal Brompton Hospital, London, UK
| | - Neil A R Gow
- Institute of Medical Sciences (IMS), University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK
- MRC Centre for Medical Mycology, University of Aberdeen, Aberdeen, UK
- School of Biosciences, University of Exeter, Exeter, UK
| | - Alexander Lorenz
- Institute of Medical Sciences (IMS), University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, UK.
| |
Collapse
|
395
|
Tusso S, Nieuwenhuis BPS, Sedlazeck FJ, Davey JW, Jeffares DC, Wolf JBW. Ancestral Admixture Is the Main Determinant of Global Biodiversity in Fission Yeast. Mol Biol Evol 2019; 36:1975-1989. [PMID: 31225876 PMCID: PMC6736153 DOI: 10.1093/molbev/msz126] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Mutation and recombination are key evolutionary processes governing phenotypic variation and reproductive isolation. We here demonstrate that biodiversity within all globally known strains of Schizosaccharomyces pombe arose through admixture between two divergent ancestral lineages. Initial hybridization was inferred to have occurred ∼20-60 sexual outcrossing generations ago consistent with recent, human-induced migration at the onset of intensified transcontinental trade. Species-wide heritable phenotypic variation was explained near-exclusively by strain-specific arrangements of alternating ancestry components with evidence for transgressive segregation. Reproductive compatibility between strains was likewise predicted by the degree of shared ancestry. To assess the genetic determinants of ancestry block distribution across the genome, we characterized the type, frequency, and position of structural genomic variation using nanopore and single-molecule real-time sequencing. Despite being associated with double-strand break initiation points, over 800 segregating structural variants exerted overall little influence on the introgression landscape or on reproductive compatibility between strains. In contrast, we found strong ancestry disequilibrium consistent with negative epistatic selection shaping genomic ancestry combinations during the course of hybridization. This study provides a detailed, experimentally tractable example that genomes of natural populations are mosaics reflecting different evolutionary histories. Exploiting genome-wide heterogeneity in the history of ancestral recombination and lineage-specific mutations sheds new light on the population history of S. pombe and highlights the importance of hybridization as a creative force in generating biodiversity.
Collapse
Affiliation(s)
- Sergio Tusso
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
- Department of Evolutionary Biology, Science for Life Laboratories, Uppsala University, Uppsala, Sweden
| | - Bart P S Nieuwenhuis
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
| | - John W Davey
- Bioscience Technology Facility, Department of Biology, University of York, York, United Kingdom
| | - Daniel C Jeffares
- Department of Biology, University of York, York, United Kingdom
- York Biomedical Research Institute (YBRI), University of York, York, United Kingdom
| | - Jochen B W Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
- Department of Evolutionary Biology, Science for Life Laboratories, Uppsala University, Uppsala, Sweden
| |
Collapse
|
396
|
Zhou Y, Minio A, Massonnet M, Solares E, Lv Y, Beridze T, Cantu D, Gaut BS. The population genetics of structural variants in grapevine domestication. NATURE PLANTS 2019; 5:965-979. [PMID: 31506640 DOI: 10.1038/s41477-019-0507-8] [Citation(s) in RCA: 175] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 07/26/2019] [Indexed: 05/20/2023]
Abstract
Structural variants (SVs) are a largely unexplored feature of plant genomes. Little is known about the type and size of SVs, their distribution among individuals and, especially, their population dynamics. Understanding these dynamics is critical for understanding both the contributions of SVs to phenotypes and the likelihood of identifying them as causal genetic variants in genome-wide associations. Here, we identify SVs and study their evolutionary genomics in clonally propagated grapevine cultivars and their outcrossing wild progenitors. To catalogue SVs, we assembled the highly heterozygous Chardonnay genome, for which one in seven genes is hemizygous based on SVs. Using an integrative comparison between Chardonnay and Cabernet Sauvignon genomes by whole-genome, long-read and short-read alignment, we extended SV detection to population samples. We found that strong purifying selection acts against SVs but particularly against inversion and translocation events. SVs nonetheless accrue as recessive heterozygotes in clonally propagated lineages. They also define outlier regions of genomic divergence between wild and cultivated grapevines, suggesting roles in domestication. Outlier regions include the sex-determination region and the berry colour locus, where independent large, complex inversions have driven convergent phenotypic evolution.
Collapse
Affiliation(s)
- Yongfeng Zhou
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA
| | - Andrea Minio
- Department of Viticulture and Enology, UC Davis, Davis, CA, USA
| | | | - Edwin Solares
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA
| | - Yuanda Lv
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA
| | - Tengiz Beridze
- Institute of Molecular Genetics, Agricultural University of Georgia, Tbilisi, Georgia
| | - Dario Cantu
- Department of Viticulture and Enology, UC Davis, Davis, CA, USA.
| | - Brandon S Gaut
- Department of Ecology and Evolutionary Biology, UC Irvine, Irvine, CA, USA.
| |
Collapse
|
397
|
Fleiss A, O'Donnell S, Fournier T, Lu W, Agier N, Delmas S, Schacherer J, Fischer G. Reshuffling yeast chromosomes with CRISPR/Cas9. PLoS Genet 2019; 15:e1008332. [PMID: 31465441 PMCID: PMC6738639 DOI: 10.1371/journal.pgen.1008332] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 09/11/2019] [Accepted: 07/26/2019] [Indexed: 12/12/2022] Open
Abstract
Genome engineering is a powerful approach to study how chromosomal architecture impacts phenotypes. However, quantifying the fitness impact of translocations independently from the confounding effect of base substitutions has so far remained challenging. We report a novel application of the CRISPR/Cas9 technology allowing to generate with high efficiency both uniquely targeted and multiple concomitant reciprocal translocations in the yeast genome. Targeted translocations are constructed by inducing two double-strand breaks on different chromosomes and forcing the trans-chromosomal repair through homologous recombination by chimerical donor DNAs. Multiple translocations are generated from the induction of several DSBs in LTR repeated sequences and promoting repair using endogenous uncut LTR copies as template. All engineered translocations are markerless and scarless. Targeted translocations are produced at base pair resolution and can be sequentially generated one after the other. Multiple translocations result in a large diversity of karyotypes and are associated in many instances with the formation of unanticipated segmental duplications. To test the phenotypic impact of translocations, we first recapitulated in a lab strain the SSU1/ECM34 translocation providing increased sulphite resistance to wine isolates. Surprisingly, the same translocation in a laboratory strain resulted in decreased sulphite resistance. However, adding the repeated sequences that are present in the SSU1 promoter of the resistant wine strain induced sulphite resistance in the lab strain, yet to a lower level than that of the wine isolate, implying that additional polymorphisms also contribute to the phenotype. These findings illustrate the advantage brought by our technique to untangle the phenotypic impacts of structural variations from confounding effects of base substitutions. Secondly, we showed that strains with multiple translocations, even those devoid of unanticipated segmental duplications, display large phenotypic diversity in a wide range of environmental conditions, showing that simply reconfiguring chromosome architecture is sufficient to provide fitness advantages in stressful growth conditions. Chromosomes are highly dynamic objects that often undergo large structural variations such as reciprocal translocations. Such rearrangements can have dramatic functional consequences, as they can disrupt genes, change their regulation or create novel fusion genes at their breakpoints. For instance, 90–95% of patients diagnosed with chronic myeloid leukemia carry the Philadelphia chromosome characterized by a reciprocal translocation between chromosomes 9 and 22. In addition, translocations reorganize the genetic information along chromosomes, which in turn can modify the 3D architecture of the genome and potentially affect its functioning. Quantifying the fitness impact of translocations independently from the confounding effect of base substitutions has so far remained challenging. Here, we report a novel CRISPR/Cas9-based technology allowing to generate with high efficiency and at a base-pair precision either uniquely targeted or multiple reciprocal translocations in yeast, without leaving any marker or scar in the genome. Engineering targeted reciprocal translocations allowed us for the first time to untangle the phenotypic impacts of large chromosomal rearrangements from that of point mutations. In addition, the generation of multiple translocations led to a large reorganization of the genetic information along the chromosomes, often including unanticipated large segmental duplications. We showed that reshuffling the genome resulted in the emergence of fitness advantage in stressful environmental conditions, even in strains where no gene was disrupted or amplified by the translocations.
Collapse
Affiliation(s)
- Aubin Fleiss
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Samuel O'Donnell
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Téo Fournier
- Université de Strasbourg, CNRS, GMGM UMR7156, Strasbourg, France
| | - Wenqing Lu
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Nicolas Agier
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Stéphane Delmas
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | | | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
- * E-mail:
| |
Collapse
|
398
|
Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 2019; 19:329-346. [PMID: 29599501 DOI: 10.1038/s41576-018-0003-4] [Citation(s) in RCA: 320] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
Collapse
Affiliation(s)
- Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hayan Lee
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Charlotte A Darby
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. .,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
399
|
Wang A, Wang Z, Li Z, Li LM. BAUM: improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach. Bioinformatics 2019; 34:2019-2028. [PMID: 29346504 DOI: 10.1093/bioinformatics/bty020] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 01/12/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Results Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. Availability and implementation http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anqi Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhanyu Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zheng Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Lei M Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
400
|
De Coster W, De Rijk P, De Roeck A, De Pooter T, D'Hert S, Strazisar M, Sleegers K, Van Broeckhoven C. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Res 2019; 29:1178-1187. [PMID: 31186302 PMCID: PMC6633254 DOI: 10.1101/gr.244939.118] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 06/06/2019] [Indexed: 01/17/2023]
Abstract
We sequenced the genome of the Yoruban reference individual NA19240 on the long-read sequencing platform Oxford Nanopore PromethION for evaluation and benchmarking of recently published aligners and germline structural variant calling tools, as well as a comparison with the performance of structural variant calling from short-read sequencing data. The structural variant caller Sniffles after NGMLR or minimap2 alignment provides the most accurate results, but additional confidence or sensitivity can be obtained by a combination of multiple variant callers. Sensitive and fast results can be obtained by minimap2 for alignment and a combination of Sniffles and SVIM for variant identification. We describe a scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long-read genome sequencing of an individual or population. By discussing the results of this well-characterized reference individual, we provide an approximation of what can be expected in future long-read sequencing studies aiming for structural variant identification.
Collapse
Affiliation(s)
- Wouter De Coster
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Peter De Rijk
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Arne De Roeck
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Tim De Pooter
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Svenn D'Hert
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Mojca Strazisar
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Kristel Sleegers
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Christine Van Broeckhoven
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| |
Collapse
|