1
|
Zhu Y, Lee BJ, Fujii S, Jonchhe S, Zhang H, Li A, Wang KJ, Rothenberg E, Modesti M, Zha S. The KU70-SAP domain has an overlapping function with DNA-PKcs in limiting the lateral movement of KU along DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.26.609806. [PMID: 39253422 PMCID: PMC11383278 DOI: 10.1101/2024.08.26.609806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
The non-homologous end-joining (NHEJ) pathway is critical for DNA double-strand break repair and is essential for lymphocyte development and maturation. The Ku70/Ku80 heterodimer (KU) binds to DNA ends, initiating NHEJ and recruiting additional factors, including DNA-dependent protein kinase catalytic subunit (DNA-PKcs) that caps the ends and pushes KU inward. The C-terminus of Ku70 in higher eukaryotes includes a flexible linker and a SAP domain, whose physiological role remains poorly understood. To investigate this, we generated a mouse model with knock-in deletion of the SAP domain ( Ku70 ΔSAP/ΔSAP ). Ku70 ΔSAP supports KU stability and its recruitment to DNA damage sites in vivo . In contrast to the growth retardation and immunodeficiency seen in Ku70 -/- mice, Ku70 ΔSAP/ΔSAP mice show no defects in lymphocyte development and maturation. Structural modeling of KU on long dsDNA, but not dsRNA suggests that the SAP domain can bind to an adjacent major groove, where it can limit KU's rotation and lateral movement along the dsDNA. Accordingly, in the absence of DNA-PKcs that caps the ends, Ku70 ΔSAP fails to support stable DNA damage-induced KU foci. In DNA-PKcs -/- mice, Ku70 ΔSAP abrogates the leaky T cell development and reduces both the qualitative and quantitative aspects of residual V(D)J recombination. In the absence of DNA-PKcs, purified Ku70 ΔSAP has reduced affinity for DNA ends and dissociates more readily at lower concentration and accumulated as multimers at high concentration. These findings revealed a physiological role of the SAP domain in NHEJ by restricting KU rotation and lateral movement on DNA that is largely masked by DNA-PKcs. Highlight Ku70 is a conserved non-homologous end-joining (NHEJ) factor. Using genetically engineered mouse models and biochemical analyses, our study uncovered a previously unappreciated role of the C-terminal SAP domain of Ku70 in limiting the lateral movement of KU on DNA ends and ensuring end protection. The presence of DNA-PKcs partially masks this role of the SAP domain.
Collapse
|
2
|
Wang Z, Zhang W, Zhou Y, Zhang Q, Kulkarni KP, Melmaiee K, Tian Y, Dong M, Gao Z, Su Y, Yu H, Xu G, Li Y, He H, Liu Q, Sun H. Genetic and epigenetic signatures for improved breeding of cultivated blueberry. HORTICULTURE RESEARCH 2024; 11:uhae138. [PMID: 38988623 PMCID: PMC11233858 DOI: 10.1093/hr/uhae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 05/05/2024] [Indexed: 07/12/2024]
Abstract
Blueberry belongs to the Vaccinium genus and is a highly popular fruit crop with significant economic importance. It was not until the early twentieth century that they began to be domesticated through extensive interspecific hybridization. Here, we collected 220 Vaccinium accessions from various geographical locations, including 154 from the United States, 14 from China, eight from Australia, and 29 from Europe and other countries, comprising 164 Vaccinium corymbosum, 15 Vaccinium ashei, 10 lowbush blueberries, seven half-high blueberries, and others. We present the whole-genome variation map of 220 accessions and reconstructed the hundred-year molecular history of interspecific hybridization of blueberry. We focused on the two major blueberry subgroups, the northern highbush blueberry (NHB) and southern highbush blueberry (SHB) and identified candidate genes that contribute to their distinct traits in climate adaptability and fruit quality. Our analysis unveiled the role of gene introgression from Vaccinium darrowii and V. ashei into SHB in driving the differentiation between SHB and NHB, potentially facilitating SHB's adaptation to subtropical environments. Assisted by genome-wide association studies, our analysis suggested VcTBL44 as a pivotal gene regulator governing fruit firmness in SHB. Additionally, we conducted whole-genome bisulfite sequencing on nine NHB and 12 SHB cultivars, and characterized regions that are differentially methylated between the two subgroups. In particular, we discovered that the β-alanine metabolic pathway genes were enriched for DNA methylation changes. Our study provides high-quality genetic and epigenetic variation maps for blueberry, which offer valuable insights and resources for future blueberry breeding.
Collapse
Affiliation(s)
- Zejia Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Wanchen Zhang
- Jilin Provincial Laboratory of Crop Germplasm Resources, College of Horticulture, Jilin Agricultural University, No. 2888 Xincheng Street, Economic Development District, Changchun 130118, China
| | - Yangyan Zhou
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Qiyan Zhang
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Krishnanand P Kulkarni
- Department of Agriculture and Natural Resources, Delaware State University, Dover, DE 19901, USA
| | - Kalpalatha Melmaiee
- Department of Agriculture and Natural Resources, Delaware State University, Dover, DE 19901, USA
| | - Youwen Tian
- Jilin Provincial Laboratory of Crop Germplasm Resources, College of Horticulture, Jilin Agricultural University, No. 2888 Xincheng Street, Economic Development District, Changchun 130118, China
| | - Mei Dong
- Jilin Provincial Laboratory of Crop Germplasm Resources, College of Horticulture, Jilin Agricultural University, No. 2888 Xincheng Street, Economic Development District, Changchun 130118, China
| | - Zhaoxu Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Yanning Su
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Hong Yu
- Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing 210014, China
| | - Guohui Xu
- College of Life and Health, Dalian University, Dalian 116622, China
| | - Yadong Li
- Jilin Provincial Laboratory of Crop Germplasm Resources, College of Horticulture, Jilin Agricultural University, No. 2888 Xincheng Street, Economic Development District, Changchun 130118, China
| | - Hang He
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Qikun Liu
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing 100871, China
| | - Haiyue Sun
- Jilin Provincial Laboratory of Crop Germplasm Resources, College of Horticulture, Jilin Agricultural University, No. 2888 Xincheng Street, Economic Development District, Changchun 130118, China
| |
Collapse
|
3
|
Horvath R, Minadakis N, Bourgeois Y, Roulin AC. The evolution of transposable elements in Brachypodium distachyon is governed by purifying selection, while neutral and adaptive processes play a minor role. eLife 2024; 12:RP93284. [PMID: 38606833 PMCID: PMC11014726 DOI: 10.7554/elife.93284] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024] Open
Abstract
Understanding how plants adapt to changing environments and the potential contribution of transposable elements (TEs) to this process is a key question in evolutionary genomics. While TEs have recently been put forward as active players in the context of adaptation, few studies have thoroughly investigated their precise role in plant evolution. Here, we used the wild Mediterranean grass Brachypodium distachyon as a model species to identify and quantify the forces acting on TEs during the adaptation of this species to various conditions, across its entire geographic range. Using sequencing data from more than 320 natural B. distachyon accessions and a suite of population genomics approaches, we reveal that putatively adaptive TE polymorphisms are rare in wild B. distachyon populations. After accounting for changes in past TE activity, we show that only a small proportion of TE polymorphisms evolved neutrally (<10%), while the vast majority of them are under moderate purifying selection regardless of their distance to genes. TE polymorphisms should not be ignored when conducting evolutionary studies, as they can be linked to adaptation. However, our study clearly shows that while they have a large potential to cause phenotypic variation in B. distachyon, they are not favored during evolution and adaptation over other types of mutations (such as point mutations) in this species.
Collapse
Affiliation(s)
- Robert Horvath
- Department of Plant and Microbial Biology, University of ZurichZurichSwitzerland
| | - Nikolaos Minadakis
- Department of Plant and Microbial Biology, University of ZurichZurichSwitzerland
| | - Yann Bourgeois
- DIADE, University of Montpellier, CIRAD, IRDMontpellierFrance
- University of PortsmouthPortsmouthUnited Kingdom
| | - Anne C Roulin
- Department of Plant and Microbial Biology, University of ZurichZurichSwitzerland
| |
Collapse
|
4
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
5
|
Agustinho DP, Brown HL, Chen G, Gaylord EA, Geddes-McAlister J, Brent MR, Doering TL. Unbiased discovery of natural sequence variants that influence fungal virulence. Cell Host Microbe 2023; 31:1910-1920.e5. [PMID: 37898126 PMCID: PMC10842055 DOI: 10.1016/j.chom.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/18/2023] [Accepted: 10/02/2023] [Indexed: 10/30/2023]
Abstract
Isolates of Cryptococcus neoformans, a fungal pathogen that kills over 112,000 people each year, differ from a 19-Mb reference genome at a few thousand up to almost a million DNA sequence positions. We used bulked segregant analysis and association analysis, genetic methods that require no prior knowledge of sequence function, to address the key question of which naturally occurring sequence variants influence fungal virulence. We identified a region containing such variants, prioritized them, and engineered strains to test our findings in a mouse model of infection. At one locus, we identified a 4-nt variant in the PDE2 gene that occurs in common laboratory strains and severely truncates the encoded phosphodiesterase. The resulting loss of phosphodiesterase activity significantly impacts virulence. Our studies demonstrate a powerful and unbiased strategy for identifying key genomic regions in the absence of prior information and provide significant sequence and strain resources to the community.
Collapse
Affiliation(s)
- Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Holly Leanne Brown
- Department of Computer Science & Engineering, Washington University, St. Louis, MO 63130, USA
| | - Guohua Chen
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Elizabeth Anne Gaylord
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Michael Richard Brent
- Department of Computer Science & Engineering, Washington University, St. Louis, MO 63130, USA; Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| | - Tamara Lea Doering
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| |
Collapse
|
6
|
Karunarathne P, Zhou Q, Schliep K, Milesi P. A comprehensive framework for detecting copy number variants from single nucleotide polymorphism data: 'rCNV', a versatile r package for paralogue and CNV detection. Mol Ecol Resour 2023; 23:1772-1789. [PMID: 37515483 DOI: 10.1111/1755-0998.13843] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 07/31/2023]
Abstract
Recent studies have highlighted the significant role of copy number variants (CNVs) in phenotypic diversity, environmental adaptation and species divergence across eukaryotes. The presence of CNVs also has the potential to introduce genotyping biases, which can pose challenges to accurate population and quantitative genetic analyses. However, detecting CNVs in genomes, particularly in non-model organisms, presents a formidable challenge. To address this issue, we have developed a statistical framework and an accompanying r software package that leverage allelic-read depth from single nucleotide polymorphism (SNP) data for accurate CNV detection. Our framework capitalises on two key principles. First, it exploits the distribution of allelic-read depth ratios in heterozygotes for individual SNPs by comparing it against an expected distribution based on binomial sampling. Second, it identifies SNPs exhibiting an apparent excess of heterozygotes under Hardy-Weinberg equilibrium. By employing multiple statistical tests, our method not only enhances sensitivity to sampling effects but also effectively addresses reference biases, resulting in optimised SNP classification. Our framework is compatible with various NGS technologies (e.g. RADseq, Exome-capture). This versatility enables CNV calling from genomes of diverse complexities. To streamline the analysis process, we have implemented our framework in the user-friendly r package 'rCNV', which automates the entire workflow seamlessly. We trained our models using simulated data and validated their performance on four datasets derived from different sequencing technologies, including RADseq (Chinook salmon-Oncorhynchus tshawytscha), Rapture (American lobster-Homarus americanus), Exome-capture (Norway spruce-Picea abies) and WGS (Malaria mosquito-Anopheles gambiae).
Collapse
Affiliation(s)
- Piyal Karunarathne
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala, Sweden
- Institute of Population Genetics, Heinrich Heine University, Düsseldorf, Germany
| | - Qiujie Zhou
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala, Sweden
| | - Klaus Schliep
- Institute of Computational Biotechnology, Graz University of Technology, Graz, Austria
| | - Pascal Milesi
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala, Sweden
| |
Collapse
|
7
|
Menolfi D, Lee BJ, Zhang H, Jiang W, Bowen NE, Wang Y, Zhao J, Holmes A, Gershik S, Rabadan R, Kim B, Zha S. ATR kinase supports normal proliferation in the early S phase by preventing replication resource exhaustion. Nat Commun 2023; 14:3618. [PMID: 37336885 DOI: 10.1038/s41467-023-39332-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 06/08/2023] [Indexed: 06/21/2023] Open
Abstract
The ATR kinase, which coordinates cellular responses to DNA replication stress, is also essential for the proliferation of normal unstressed cells. Although its role in the replication stress response is well defined, the mechanisms by which ATR supports normal cell proliferation remain elusive. Here, we show that ATR is dispensable for the viability of G0-arrested naïve B cells. However, upon cytokine-induced proliferation, Atr-deficient B cells initiate DNA replication efficiently, but by mid-S phase they display dNTP depletion, fork stalling, and replication failure. Nonetheless, productive DNA replication and dNTP levels can be restored in Atr-deficient cells by suppressing origin firing, such as partial inhibition of CDC7 and CDK1 kinase activities. Together, these findings indicate that ATR supports the proliferation of normal unstressed cells by tempering the pace of origin firing during the early S phase to avoid exhaustion of dNTPs and importantly also other replication factors.
Collapse
Affiliation(s)
- Demis Menolfi
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Brian J Lee
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Hanwen Zhang
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Wenxia Jiang
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Nicole E Bowen
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Yunyue Wang
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Junfei Zhao
- Program for Mathematical Genomics, Department of Systems Biology, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Antony Holmes
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Steven Gershik
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Raul Rabadan
- Program for Mathematical Genomics, Department of Systems Biology, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA
| | - Baek Kim
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Shan Zha
- Institute for Cancer Genetics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA.
- Department of Pathology and Cell Biology, Herbert Irvine Comprehensive Cancer Center, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA.
- Division of Pediatric Hematology, Oncology and Stem Cell Transplantation, Department of Pediatrics, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA.
- Department of Immunology and Microbiology, Vagelos College for Physicians and Surgeons, Columbia University, New York City, NY, 10032, USA.
| |
Collapse
|
8
|
Sahlin K, Baudeau T, Cazaux B, Marchet C. A survey of mapping algorithms in the long-reads era. Genome Biol 2023; 24:133. [PMID: 37264447 PMCID: PMC10236595 DOI: 10.1186/s13059-023-02972-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/12/2023] [Indexed: 06/03/2023] Open
Abstract
It has been over a decade since the first publication of a method dedicated entirely to mapping long-reads. The distinctive characteristics of long reads resulted in methods moving from the seed-and-extend framework used for short reads to a seed-and-chain framework due to the seed abundance in each read. The main novelties are based on alternative seed constructs or chaining formulations. Dozens of tools now exist, whose heuristics have evolved considerably. We provide an overview of the methods used in long-read mappers. Since they are driven by implementation-specific parameters, we develop an original visualization tool to understand the parameter settings ( http://bcazaux.polytech-lille.net/Minimap2/ ).
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden.
| | - Thomas Baudeau
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France
| | - Bastien Cazaux
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France
| | - Camille Marchet
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France.
| |
Collapse
|
9
|
Menolfi D, Lee BJ, Zhang H, Jiang W, Bowen NE, Wang Y, Zhao J, Holmes A, Gershik S, Rabadan R, Kim B, Zha S. ATR kinase supports normal proliferation in the early S phase by preventing replication resource exhaustion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.26.542515. [PMID: 37292881 PMCID: PMC10246007 DOI: 10.1101/2023.05.26.542515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The ATR kinase, which coordinates cellular responses to DNA replication stress, is also essential for the proliferation of normal unstressed cells. Although its role in the replication stress response is well defined, the mechanisms by which ATR supports normal cell proliferation remain elusive. Here, we show that ATR is dispensable for the viability of G0-arrested naïve B cells. However, upon cytokine-induced proliferation, Atr-deficient B cells initiate DNA replication efficiently in early S phase, but by mid-S phase they display dNTP depletion, fork stalling, and replication failure. Nonetheless, productive DNA replication can be restored in Atr-deficient cells by pathways that suppress origin firing, such as downregulation of CDC7 and CDK1 kinase activities. Together, these findings indicate that ATR supports the proliferation of normal unstressed cells by tempering the pace of origin firing during the early S phase to avoid exhaustion of dNTPs and other replication factors.
Collapse
|
10
|
Wei ZG, Fan XG, Zhang H, Zhang XD, Liu F, Qian Y, Zhang SW. kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph. Front Genet 2022; 13:890651. [PMID: 35601495 PMCID: PMC9117619 DOI: 10.3389/fgene.2022.890651] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 04/07/2022] [Indexed: 11/13/2022] Open
Abstract
With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Xing-Guo Fan
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Hao Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Xiao-Dan Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Yu Qian
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
- *Correspondence: Yu Qian, ; Shao-Wu Zhang,
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Yu Qian, ; Shao-Wu Zhang,
| |
Collapse
|
11
|
Identifying proximal RNA interactions from cDNA-encoded crosslinks with ShapeJumper. PLoS Comput Biol 2021; 17:e1009632. [PMID: 34905538 PMCID: PMC8670686 DOI: 10.1371/journal.pcbi.1009632] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 11/11/2021] [Indexed: 01/07/2023] Open
Abstract
SHAPE-JuMP is a concise strategy for identifying close-in-space interactions in RNA molecules. Nucleotides in close three-dimensional proximity are crosslinked with a bi-reactive reagent that covalently links the 2'-hydroxyl groups of the ribose moieties. The identities of crosslinked nucleotides are determined using an engineered reverse transcriptase that jumps across crosslinked sites, resulting in a deletion in the cDNA that is detected using massively parallel sequencing. Here we introduce ShapeJumper, a bioinformatics pipeline to process SHAPE-JuMP sequencing data and to accurately identify through-space interactions, as observed in complex JuMP datasets. ShapeJumper identifies proximal interactions with near-nucleotide resolution using an alignment strategy that is optimized to tolerate the unique non-templated reverse-transcription profile of the engineered crosslink-traversing reverse-transcriptase. JuMP-inspired strategies are now poised to replace adapter-ligation for detecting RNA-RNA interactions in most crosslinking experiments.
Collapse
|
12
|
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021; 22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open
Abstract
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Collapse
Affiliation(s)
- Mohammed Alser
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Jeremy Rotman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Dhrithi Deshpande
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Victor Xue
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA
- Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA
- Biology Department, Pennsylvania State University, University Park, PA, 16801, USA
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Can Alkan
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
| | - Onur Mutlu
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
13
|
Zhang Z, Luo J, Shang J, Li M, Wu FX, Pan Y, Wang J. Deletion Detection Method Using the Distribution of Insert Size and a Precise Alignment Strategy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1070-1081. [PMID: 31403441 DOI: 10.1109/tcbb.2019.2934407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Homozygous and heterozygous deletions commonly exist in the human genome. For current structural variation detection tools, it is significant to determine whether a deletion is homozygous or heterozygous. However, the problems of sequencing errors, micro-homologies, and micro-insertions prohibit common alignment tools from identifying accurate breakpoint locations, and often result in detecting false structural variations. In this study, we present a novel deletion detection tool called Sprites2. Comparing with Sprites, Sprites2 makes the following modifications: (1) The distribution of insert size is used in Sprites2, which can identify the type of deletions and improve the accuracy of deletion calls. (2) A precise alignment method based on AGE (one algorithm simultaneously aligning 5' and 3' ends between two sequences) is adopted in Sprites2 to identify breakpoints, which is helpful to resolve the problems introduced by sequencing errors, micro-homologies, and micro-insertions. In order to test and verify the performance of Sprites2, some simulated and real datasets are adopted in our experiments, and Sprites2 is compared with five popular tools. The experimental results show that Sprites2 can improve the performance of deletion detection. Sprites2 can be downloaded from https://github.com/zhangzhen/sprites2.
Collapse
|
14
|
Milanovic M, Shao Z, Estes VM, Wang XS, Menolfi D, Lin X, Lee BJ, Xu J, Cupo OM, Wang D, Zha S. FATC Domain Deletion Compromises ATM Protein Stability, Blocks Lymphocyte Development, and Promotes Lymphomagenesis. THE JOURNAL OF IMMUNOLOGY 2021; 206:1228-1239. [PMID: 33536256 DOI: 10.4049/jimmunol.2000967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 01/04/2021] [Indexed: 12/21/2022]
Abstract
Ataxia-telangiectasia mutated (ATM) kinase is a master regulator of the DNA damage response, and loss of ATM leads to primary immunodeficiency and greatly increased risk for lymphoid malignancies. The FATC domain is conserved in phosphatidylinositol-3-kinase-related protein kinases (PIKKs). Truncation mutation in the FATC domain (R3047X) selectively compromised reactive oxygen species-induced ATM activation in cell-free assays. In this article, we show that in mouse models, knock-in ATM-R3057X mutation (Atm RX , corresponding to R3047X in human ATM) severely compromises ATM protein stability and causes T cell developmental defects, B cell Ig class-switch recombination defects, and infertility resembling ATM-null. The residual ATM-R3057X protein retains minimal yet functional measurable DNA damage-induced checkpoint activation and significantly delays lymphomagenesis in Atm RX/RX mice compared with Atm -/- . Together, these results support a physiological role of the FATC domain in ATM protein stability and show that the presence of minimal residual ATM-R3057X protein can prevent growth retardation and delay tumorigenesis without restoring lymphocyte development and fertility.
Collapse
Affiliation(s)
- Maja Milanovic
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Zhengping Shao
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Verna M Estes
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Xiaobin S Wang
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032.,Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Demis Menolfi
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Xiaohui Lin
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Brian J Lee
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Jun Xu
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093
| | - Olivia M Cupo
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Dong Wang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093
| | - Shan Zha
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032; .,Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032.,Division of Pediatric Oncology, Hematology and Stem Cell Transplantation, Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032; and.,Department of Immunology and Microbiology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| |
Collapse
|
15
|
CtIP-mediated DNA resection is dispensable for IgH class switch recombination by alternative end-joining. Proc Natl Acad Sci U S A 2020; 117:25700-25711. [PMID: 32989150 DOI: 10.1073/pnas.2010972117] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
To generate antibodies with different effector functions, B cells undergo Immunoglobulin Heavy Chain (IgH) class switch recombination (CSR). The ligation step of CSR is usually mediated by the classical nonhomologous end-joining (cNHEJ) pathway. In cNHEJ-deficient cells, a remarkable ∼25% of CSR can be achieved by the alternative end-joining (Alt-EJ) pathway that preferentially uses microhomology (MH) at the junctions. While A-EJ-mediated repair of endonuclease-generated breaks requires DNA end resection, we show that CtIP-mediated DNA end resection is dispensable for A-EJ-mediated CSR using cNHEJ-deficient B cells. High-throughput sequencing analyses revealed that loss of ATM/ATR phosphorylation of CtIP at T855 or ATM kinase inhibition suppresses resection without altering the MH pattern of the A-EJ-mediated switch junctions. Moreover, we found that ATM kinase promotes Alt-EJ-mediated CSR by suppressing interchromosomal translocations independent of end resection. Finally, temporal analyses reveal that MHs are enriched in early internal deletions even in cNHEJ-proficient B cells. Thus, we propose that repetitive IgH switch regions represent favored substrates for MH-mediated end-joining contributing to the robustness and resection independence of A-EJ-mediated CSR.
Collapse
|
16
|
DNA-PKcs phosphorylation at the T2609 cluster alters the repair pathway choice during immunoglobulin class switch recombination. Proc Natl Acad Sci U S A 2020; 117:22953-22961. [PMID: 32868446 DOI: 10.1073/pnas.2007455117] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The DNA-dependent protein kinase (DNA-PK), which is composed of the KU heterodimer and the large catalytic subunit (DNA-PKcs), is a classical nonhomologous end-joining (cNHEJ) factor. Naïve B cells undergo class switch recombination (CSR) to generate antibodies with different isotypes by joining two DNA double-strand breaks at different switching regions via the cNHEJ pathway. DNA-PK and the cNHEJ pathway play important roles in the DNA repair phase of CSR. To initiate cNHEJ, KU binds to DNA ends and recruits and activates DNA-PK. Activated DNA-PK phosphorylates DNA-PKcs at the S2056 and T2609 clusters. Loss of T2609 cluster phosphorylation increases radiation sensitivity but whether T2609 phosphorylation has a role in physiological DNA repair remains elusive. Using the DNA-PKcs 5A mouse model carrying alanine substitutions at the T2609 cluster, here we show that loss of T2609 phosphorylation of DNA-PKcs does not affect the CSR efficiency. Yet, the CSR junctions recovered from DNA-PKcs 5A/5A B cells reveal increased chromosomal translocations, extensive use of distal switch regions (consistent with end resection), and preferential usage of microhomology-all signs of the alternative end-joining pathway. Thus, these results uncover a role of DNA-PKcs T2609 phosphorylation in promoting cNHEJ repair pathway choice during CSR.
Collapse
|
17
|
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C, Buyske S, Matise TC, Muzny DM, Zody MC, Lander ES, Dutcher SK, Stitziel NO, Hall IM. Mapping and characterization of structural variation in 17,795 human genomes. Nature 2020; 583:83-89. [PMID: 32460305 PMCID: PMC7547914 DOI: 10.1038/s41586-020-2371-0] [Citation(s) in RCA: 182] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Accepted: 05/18/2020] [Indexed: 12/18/2022]
Abstract
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Collapse
Affiliation(s)
- Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - David E Larson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Colby Chiang
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Indraniel Das
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Krishna L Kanchi
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Benjamin M Neale
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - Tara C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Nathan O Stitziel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA.
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA.
| |
Collapse
|
18
|
Abstract
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Collapse
|
19
|
Grosenbaugh DK, Joshi S, Fitzgerald MP, Lee KS, Wagley PK, Koeppel AF, Turner SD, McConnell MJ, Goodkin HP. A deletion in Eml1 leads to bilateral subcortical heterotopia in the tish rat. Neurobiol Dis 2020; 140:104836. [PMID: 32179177 PMCID: PMC7814471 DOI: 10.1016/j.nbd.2020.104836] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 03/11/2020] [Accepted: 03/12/2020] [Indexed: 12/13/2022] Open
Abstract
Children with malformations of cortical development (MCD) are at risk for epilepsy, developmental delays, behavioral disorders, and intellectual disabilities. For a subset of these children, antiseizure medications or epilepsy surgery may result in seizure freedom. However, there are limited options for treating or curing the other conditions, and epilepsy surgery is not an option in all cases of pharmacoresistant epilepsy. Understanding the genetic and neurobiological mechanisms underlying MCD is a necessary step in elucidating novel therapeutic targets. The tish (telencephalic internal structural heterotopia) rat is a unique model of MCD with spontaneous seizures, but the underlying genetic mutation(s) have remained unknown. DNA and RNA-sequencing revealed that a deletion encompassing a previously unannotated first exon markedly diminished Eml1 transcript and protein abundance in the tish brain. Developmental electrographic characterization of the tish rat revealed early-onset of spontaneous spike-wave discharge (SWD) bursts beginning at postnatal day (P) 17. A dihybrid cross demonstrated that the mutant Eml1 allele segregates with the observed dysplastic cortex and the early-onset SWD bursts in monogenic autosomal recessive frequencies. Our data link the development of the bilateral, heterotopic dysplastic cortex of the tish rat to a deletion in Eml1.
Collapse
Affiliation(s)
- Denise K Grosenbaugh
- Department of Neurology, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Suchitra Joshi
- Department of Neurology, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Mark P Fitzgerald
- Department of Neuroscience, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Kevin S Lee
- Department of Neuroscience, University of Virginia School of Medicine, Charlottesville, VA, United States; Department of Neurosurgery, University of Virginia School of Medicine, Charlottesville, VA, United States; Center for Brain Immunology and Glia, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Pravin K Wagley
- Department of Neurology, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Alexander F Koeppel
- Center for Public Health Sciences, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Stephen D Turner
- Center for Public Health Sciences, University of Virginia School of Medicine, Charlottesville, VA, United States
| | - Michael J McConnell
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, United States; Department of Neuroscience, University of Virginia School of Medicine, Charlottesville, VA, United States; Center for Brain Immunology and Glia, University of Virginia School of Medicine, Charlottesville, VA, United States; Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, United States.
| | - Howard P Goodkin
- Department of Neurology, University of Virginia School of Medicine, Charlottesville, VA, United States; Department of Pediatrics, University of Virginia School of Medicine, Charlottesville, VA, United States.
| |
Collapse
|
20
|
Shrestha AMS, Yoshikawa N, Asai K. Combining probabilistic alignments with read pair information improves accuracy of split-alignments. Bioinformatics 2019; 34:3631-3637. [PMID: 29790902 PMCID: PMC6198854 DOI: 10.1093/bioinformatics/bty398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 05/13/2018] [Indexed: 11/14/2022] Open
Abstract
Motivation Split-alignments provide base-pair-resolution evidence of genomic rearrangements. In practice, they are found by first computing high-scoring local alignments, parts of which are then combined into a split-alignment. This approach is challenging when aligning a short read to a large and repetitive reference, as it tends to produce many spurious local alignments leading to ambiguities in identifying the correct split-alignment. This problem is further exacerbated by the fact that rearrangements tend to occur in repeat-rich regions. Results We propose a split-alignment technique that combats the issue of ambiguous alignments by combining information from probabilistic alignment with positional information from paired-end reads. We demonstrate that our method finds accurate split-alignments, and that this translates into improved performance of variant-calling tools that rely on split-alignments. Availability and implementation An open-source implementation is freely available at: https://bitbucket.org/splitpairedend/last-split-pe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anish M S Shrestha
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, Japan
| | - Naruki Yoshikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba, Japan.,Artificial Intelligence Research Center, AIST, 2-3-26 Aomi, Koto-ku, Tokyo, Japan
| |
Collapse
|
21
|
Frith MC, Khan S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res 2019; 46:1661-1673. [PMID: 29272440 PMCID: PMC5829575 DOI: 10.1093/nar/gkx1266] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/07/2017] [Indexed: 01/29/2023] Open
Abstract
Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex 'local' mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| | - Sofia Khan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| |
Collapse
|
22
|
Liu X, Wang XS, Lee BJ, Wu-Baer FK, Lin X, Shao Z, Estes VM, Gautier J, Baer R, Zha S. CtIP is essential for early B cell proliferation and development in mice. J Exp Med 2019; 216:1648-1663. [PMID: 31097467 PMCID: PMC6605744 DOI: 10.1084/jem.20181139] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Revised: 09/10/2018] [Accepted: 04/24/2019] [Indexed: 11/08/2022] Open
Abstract
B cell development requires efficient proliferation and successful assembly and modifications of the immunoglobulin gene products. CtIP is an essential gene implicated in end resection and DNA repair. Here, we show that CtIP is essential for early B cell development but dispensable in naive B cells. CtIP loss is well tolerated in G1-arrested B cells and during V(D)J recombination, but in proliferating B cells, CtIP loss leads to a progressive cell death characterized by ATM hyperactivation, G2/M arrest, genomic instability, and 53BP1 nuclear body formation, indicating that the essential role of CtIP during proliferation underscores its stage-specific requirement in B cells. B cell proliferation requires phosphorylation of CtIP at T847 presumably by CDK, but not its interaction with CtBP or Rb or its nuclease activity. CtIP phosphorylation by ATM/ATR at T859 (T855 in mice) promotes end resection in G1-arrested cells but is dispensable for B cell development and class switch recombination, suggesting distinct roles for T859 and T847 phosphorylation in B cell development.
Collapse
Affiliation(s)
- Xiangyu Liu
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
- Guangdong Key Laboratory of Genome Instability and Human Disease Prevention, Shenzhen University Carson Cancer Center, Department of Biochemistry and Molecular Biology, School of Medicine, Shenzhen University, Shenzhen, China
| | - Xiaobin S Wang
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
- Pathobiology and Human Disease Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Brian J Lee
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Foon K Wu-Baer
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Xiaohui Lin
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Zhengping Shao
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Verna M Estes
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Jean Gautier
- Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Richard Baer
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
- Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
| | - Shan Zha
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
- Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY
- Division of Pediatric Oncology, Hematology and Stem Cell Transplantation, Department of Pediatrics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY
| |
Collapse
|
23
|
Jiang W, Estes VM, Wang XS, Shao Z, Lee BJ, Lin X, Crowe JL, Zha S. Phosphorylation at S2053 in Murine (S2056 in Human) DNA-PKcs Is Dispensable for Lymphocyte Development and Class Switch Recombination. THE JOURNAL OF IMMUNOLOGY 2019; 203:178-187. [PMID: 31101667 DOI: 10.4049/jimmunol.1801657] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/22/2019] [Indexed: 11/19/2022]
Abstract
The classical nonhomologous end-joining (cNHEJ) pathway is a major DNA double-strand break repair pathway in mammalian cells and is required for lymphocyte development and maturation. The DNA-dependent protein kinase (DNA-PK) is a cNHEJ factor that encompasses the Ku70-Ku80 (KU) heterodimer and the large DNA-PK catalytic subunit (DNA-PKcs). In mouse models, loss of DNA-PKcs (DNA-PKcs-/- ) abrogates end processing (e.g., hairpin opening), but not end-ligation, whereas expression of the kinase-dead DNA-PKcs protein (DNA-PKcsKD/KD ) abrogates end-ligation, suggesting a kinase-dependent structural function of DNA-PKcs during cNHEJ. Lymphocyte development is abolished in DNA-PKcs-/- and DNA-PKcsKD/KD mice because of the requirement for both hairpin opening and end-ligation during V(D)J recombination. DNA-PKcs itself is the best-characterized substrate of DNA-PK. The S2056 cluster is the best-characterized autophosphorylation site in human DNA-PKcs. In this study, we show that radiation can induce phosphorylation of murine DNA-PKcs at the corresponding S2053. We also generated knockin mouse models with alanine- (DNA-PKcsPQR) or phospho-mimetic aspartate (DNA-PKcsSD) substitutions at the S2053 cluster. Despite moderate radiation sensitivity in the DNA-PKcsPQR/PQR fibroblasts and lymphocytes, both DNA-PKcsPQR/PQR and DNA-PKcsSD/SD mice retained normal kinase activity and underwent efficient V(D)J recombination and class switch recombination, indicating that phosphorylation at the S2053 cluster of murine DNA-PKcs (corresponding to S2056 of human DNA-PKcs), although important for radiation resistance, is dispensable for the end-ligation and hairpin-opening function of DNA-PK essential for lymphocyte development.
Collapse
Affiliation(s)
- Wenxia Jiang
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Verna M Estes
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Xiaobin S Wang
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032.,Graduate Program of Pathobiology and Molecular Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Zhengping Shao
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Brian J Lee
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Xiaohui Lin
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Jennifer L Crowe
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032.,Graduate Program of Pathobiology and Molecular Medicine, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| | - Shan Zha
- Institute for Cancer Genetics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032; .,Division of Pediatric Oncology, Hematology and Stem Cell Transplantation, Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032; and.,Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032
| |
Collapse
|
24
|
Dennenmoser S, Sedlazeck FJ, Schatz MC, Altmüller J, Zytnicki M, Nolte AW. Genome‐wide patterns of transposon proliferation in an evolutionary young hybrid fish. Mol Ecol 2019; 28:1491-1505. [DOI: 10.1111/mec.14969] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 10/15/2018] [Accepted: 10/23/2018] [Indexed: 01/19/2023]
Affiliation(s)
- Stefan Dennenmoser
- Institute for Biology and Environmental Sciences Carl von Ossietzky University Oldenburg Oldenburg Germany
| | | | - Michael C. Schatz
- Cold Spring Harbor Laboratory Cold Spring Harbor New York
- Departments of Computer Science and Biology Johns Hopkins University Baltimore Maryland
| | - Janine Altmüller
- Cologne Center for Genomics, and Institute of Human Genetics University of Cologne Cologne Germany
| | | | - Arne W. Nolte
- Institute for Biology and Environmental Sciences Carl von Ossietzky University Oldenburg Oldenburg Germany
| |
Collapse
|
25
|
Varadharajan S, Sandve SR, Gillard GB, Tørresen OK, Mulugeta TD, Hvidsten TR, Lien S, Asbjørn Vøllestad L, Jentoft S, Nederbragt AJ, Jakobsen KS. The Grayling Genome Reveals Selection on Gene Expression Regulation after Whole-Genome Duplication. Genome Biol Evol 2018; 10:2785-2800. [PMID: 30239729 PMCID: PMC6200313 DOI: 10.1093/gbe/evy201] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2018] [Indexed: 02/06/2023] Open
Abstract
Whole-genome duplication (WGD) has been a major evolutionary driver of increased genomic complexity in vertebrates. One such event occurred in the salmonid family ∼80 Ma (Ss4R) giving rise to a plethora of structural and regulatory duplicate-driven divergence, making salmonids an exemplary system to investigate the evolutionary consequences of WGD. Here, we present a draft genome assembly of European grayling (Thymallus thymallus) and use this in a comparative framework to study evolution of gene regulation following WGD. Among the Ss4R duplicates identified in European grayling and Atlantic salmon (Salmo salar), one-third reflect nonneutral tissue expression evolution, with strong purifying selection, maintained over ∼50 Myr. Of these, the majority reflect conserved tissue regulation under strong selective constraints related to brain and neural-related functions, as well as higher-order protein–protein interactions. A small subset of the duplicates have evolved tissue regulatory expression divergence in a common ancestor, which have been subsequently conserved in both lineages, suggestive of adaptive divergence following WGD. These candidates for adaptive tissue expression divergence have elevated rates of protein coding- and promoter-sequence evolution and are enriched for immune- and lipid metabolism ontology terms. Lastly, lineage-specific duplicate divergence points toward underlying differences in adaptive pressures on expression regulation in the nonanadromous grayling versus the anadromous Atlantic salmon. Our findings enhance our understanding of the role of WGD in genome evolution and highlight cases of regulatory divergence of Ss4R duplicates, possibly related to a niche shift in early salmonid evolution.
Collapse
Affiliation(s)
- Srinidhi Varadharajan
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Norway
| | - Simen R Sandve
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
| | - Gareth B Gillard
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Norway
| | - Teshome D Mulugeta
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway.,Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, Sweden
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, Norway
| | - Leif Asbjørn Vøllestad
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Norway
| | - Alexander J Nederbragt
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Norway.,Biomedical Informatics Research Group, Department of Informatics, University of Oslo, Norway
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Norway
| |
Collapse
|
26
|
Hehir-Kwa JY, Tops BBJ, Kemmeren P. The clinical implementation of copy number detection in the age of next-generation sequencing. Expert Rev Mol Diagn 2018; 18:907-915. [PMID: 30221560 DOI: 10.1080/14737159.2018.1523723] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
INTRODUCTION The role of copy number variants (CNVs) in disease is now well established. In parallel NGS technologies, such as long-read technologies, there is continual development and data analysis methods continue to be refined. Clinical exome sequencing data is now a reality for many diagnostic laboratories in both congenital genetics and oncology. This provides the ability to detect and report both SNVs and structural variants, including CNVs, using a single assay for a wide range of patient cohorts. Areas covered: Currently, whole-genome sequencing is mainly restricted to research applications and clinical utility studies. Furthermore, detecting the full-size spectrum of CNVs as well as somatic events remains difficult for both exome and whole-genome sequencing. As a result, the full extent of genomic variants in an individual's genome is still largely unknown. Recently, new sequencing technologies have been introduced which maintain the long-range genomic context, aiding the detection of CNVs and structural variants. Expert commentary: The development of long-read sequencing promises to resolve many CNV and SV detection issues but is yet to become established. The current challenge for clinical CNV detection is how to fully exploit all the data which is generated by high throughput sequencing technologies.
Collapse
Affiliation(s)
- Jayne Y Hehir-Kwa
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| | - Bastiaan B J Tops
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| | - Patrick Kemmeren
- a Princess Máxima Center for Pediatric Oncology , Utrecht , Netherlands
| |
Collapse
|
27
|
Monlong J, Cossette P, Meloche C, Rouleau G, Girard SL, Bourque G. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res 2018; 46:7236-7249. [PMID: 30137632 PMCID: PMC6101599 DOI: 10.1093/nar/gky538] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 05/04/2018] [Accepted: 06/12/2018] [Indexed: 12/18/2022] Open
Abstract
Copy number variants (CNVs) are known to affect a large portion of the human genome and have been implicated in many diseases. Although whole-genome sequencing (WGS) can help identify CNVs, most analytical methods suffer from limited sensitivity and specificity, especially in regions of low mappability. To address this, we use PopSV, a CNV caller that relies on multiple samples to control for technical variation. We demonstrate that our calls are stable across different types of repeat-rich regions and validate the accuracy of our predictions using orthogonal approaches. Applying PopSV to 640 human genomes, we find that low-mappability regions are approximately 5 times more likely to harbor germline CNVs, in stark contrast to the nearly uniform distribution observed for somatic CNVs in 95 cancer genomes. In addition to known enrichments in segmental duplication and near centromeres and telomeres, we also report that CNVs are enriched in specific types of satellite and in some of the most recent families of transposable elements. Finally, using this comprehensive approach, we identify 3455 regions with recurrent CNVs that were missing from existing catalogs. In particular, we identify 347 genes with a novel exonic CNV in low-mappability regions, including 29 genes previously associated with disease.
Collapse
Affiliation(s)
- Jean Monlong
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
| | - Patrick Cossette
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Caroline Meloche
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
| | - Guy Rouleau
- Montreal Neurological Institute, McGill University, Montréal H3A 2B4, Canada
| | - Simon L Girard
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Centre de Recherche du Centre Hospitalier de l’Universite de Montréal, Montréal H2X 0A9, Canada
- Département des sciences fondamentales, Université du Québec à Chicoutimi, Chicoutimi G7H 2B1, Canada
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal H3A 1B1, Canada
- Canadian Center for Computational Genomics, Montréal H3A 1A4, Canada
- McGill University and Génome Québec Innovation Center, Montréal H3A 1A4, Canada
| |
Collapse
|
28
|
Kinase-dependent structural role of DNA-PKcs during immunoglobulin class switch recombination. Proc Natl Acad Sci U S A 2018; 115:8615-8620. [PMID: 30072430 DOI: 10.1073/pnas.1808490115] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The catalytic subunit of DNA-dependent protein kinase (DNA-PKcs) is a classical nonhomologous end-joining (cNHEJ) factor. Loss of DNA-PKcs diminished mature B cell class switch recombination (CSR) to other isotypes, but not IgG1. Here, we show that expression of the kinase-dead DNA-PKcs (DNA-PKcsKD/KD ) severely compromises CSR to IgG1. High-throughput sequencing analyses of CSR junctions reveal frequent accumulation of nonproductive interchromosomal translocations, inversions, and extensive end resection in DNA-PKcsKD/KD , but not DNA-PKcs-/- , B cells. Meanwhile, the residual joints from DNA-PKcsKD/KD cells and the efficient Sµ-Sγ1 junctions from DNA-PKcs-/- B cells both display similar preferences for small (2-6 nt) microhomologies (MH). In DNA-PKcs-/- cells, Sµ-Sγ1 joints are more resistant to inversions and extensive resection than Sµ-Sε and Sµ-Sµ joints, providing a mechanism for the isotype-specific CSR defects. Together, our findings identify a kinase-dependent role of DNA-PKcs in suppressing MH-mediated end joining and a structural role of DNA-PKcs protein in the orientation of CSR.
Collapse
|
29
|
Rajeh A, Lv J, Lin Z. Heterogeneous rates of genome rearrangement contributed to the disparity of species richness in Ascomycota. BMC Genomics 2018; 19:282. [PMID: 29690866 PMCID: PMC5937819 DOI: 10.1186/s12864-018-4683-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/16/2018] [Indexed: 01/06/2023] Open
Abstract
Background Chromosomal rearrangements have been shown to facilitate speciation through creating a barrier of gene flow. However, it is not known whether heterogeneous rates of chromosomal rearrangement at the genome scale contributed to the huge disparity of species richness among different groups of organisms, which is one of the most remarkable and pervasive patterns on Earth. The largest fungal phylum Ascomycota is an ideal study system to address this question because it comprises three subphyla (Saccharomycotina, Taphrinomycotina, and Pezizomycotina) whose species numbers differ by two orders of magnitude (59,000, 1000, and 150 respectively). Results We quantified rates of genome rearrangement for 71 Ascomycota species that have well-assembled genomes. The rates of inter-species genome rearrangement, which were inferred based on the divergence rates of gene order, are positively correlated with species richness at both ranks of subphylum and class in Ascomycota. This finding is further supported by our quantification of intra-species rearrangement rates based on paired-end genome sequencing data of 216 strains from three representative species, suggesting a difference of intrinsic genome instability among Ascomycota lineages. Our data also show that different rates of imbalanced rearrangements, such as deletions, are a major contributor to the heterogenous rearrangement rates. Conclusions Various lines of evidence in this study support that a higher rate of rearrangement at the genome scale might have accelerated the speciation process and increased species richness during the evolution of Ascomycota species. Our findings provide a plausible explanation for the species disparity among Ascomycota lineages, which will be valuable to unravel the underlying causes for the huge disparity of species richness in various taxonomic groups. Electronic supplementary material The online version of this article (10.1186/s12864-018-4683-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ahmad Rajeh
- Department of Biology, Saint Louis University, St. Louis, MO, 63103, USA.,Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Jie Lv
- Department of BioSciences, Rice University, Houston, TX, 77005, USA
| | - Zhenguo Lin
- Department of Biology, Saint Louis University, St. Louis, MO, 63103, USA.
| |
Collapse
|
30
|
Darracq A, Vitte C, Nicolas S, Duarte J, Pichon JP, Mary-Huard T, Chevalier C, Bérard A, Le Paslier MC, Rogowsky P, Charcosset A, Joets J. Sequence analysis of European maize inbred line F2 provides new insights into molecular and chromosomal characteristics of presence/absence variants. BMC Genomics 2018; 19:119. [PMID: 29402214 PMCID: PMC5800051 DOI: 10.1186/s12864-018-4490-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 01/22/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Maize is well known for its exceptional structural diversity, including copy number variants (CNVs) and presence/absence variants (PAVs), and there is growing evidence for the role of structural variation in maize adaptation. While PAVs have been described in this important crop species, they have been only scarcely characterized at the sequence level and the extent of presence/absence variation and relative chromosomal landscape of inbred-specific regions remain to be elucidated. RESULTS De novo genome sequencing of the French F2 maize inbred line revealed 10,044 novel genomic regions larger than 1 kb, making up 88 Mb of DNA, that are present in F2 but not in B73 (PAV). This set of maize PAV sequences allowed us to annotate PAV content and to analyze sequence breakpoints. Using PAV genotyping on a collection of 25 temperate lines, we also analyzed Linkage Disequilibrium in PAVs and flanking regions, and PAV frequencies within maize genetic groups. CONCLUSIONS We highlight the possible role of MMEJ-type double strand break repair in maize PAV formation and discover 395 new genes with transcriptional support. Pattern of linkage disequilibrium within PAVs strikingly differs from this of flanking regions and is in accordance with the intuition that PAVs may recombine less than other genomic regions. We show that most PAVs are ancient, while some are found only in European Flint material, thus pinpointing structural features that may be at the origin of adaptive traits involved in the success of this material. Characterization of such PAVs will provide useful material for further association genetic studies in European and temperate maize.
Collapse
Affiliation(s)
- Aude Darracq
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Clémentine Vitte
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Stéphane Nicolas
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | | | | | - Tristan Mary-Huard
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
- MIA, INRA, AgroParisTech, Université Paris-Saclay, Paris, France
| | - Céline Chevalier
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Aurélie Bérard
- EPGV US 1279, INRA, CEA, IG-CNG, Université Paris-Saclay, Evry, France
| | | | - Peter Rogowsky
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRA, Lyon, France
| | - Alain Charcosset
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Johann Joets
- Genetique Quantitative et Evolution – Le Moulon, INRA, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| |
Collapse
|
31
|
Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, Balloux F, Dessimoz C, Bähler J, Sedlazeck FJ. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun 2017; 8:14061. [PMID: 28117401 DOI: 10.1038/ncomms14061] [Citation(s) in RCA: 427] [Impact Index Per Article: 53.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 11/24/2016] [Indexed: 02/08/2023] Open
Abstract
Large structural variations (SVs) within genomes are more challenging to identify than smaller genetic variants but may substantially contribute to phenotypic diversity and evolution. We analyse the effects of SVs on gene expression, quantitative traits and intrinsic reproductive isolation in the yeast Schizosaccharomyces pombe. We establish a high-quality curated catalogue of SVs in the genomes of a worldwide library of S. pombe strains, including duplications, deletions, inversions and translocations. We show that copy number variants (CNVs) show a variety of genetic signals consistent with rapid turnover. These transient CNVs produce stoichiometric effects on gene expression both within and outside the duplicated regions. CNVs make substantial contributions to quantitative traits, most notably intracellular amino acid concentrations, growth under stress and sugar utilization in winemaking, whereas rearrangements are strongly associated with reproductive isolation. Collectively, these findings have broad implications for evolution and for our understanding of quantitative traits including complex human diseases.
Collapse
Affiliation(s)
- Daniel C Jeffares
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Clemency Jolly
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Mimoza Hoti
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Doug Speed
- UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Liam Shaw
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Charalampos Rallis
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Francois Balloux
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Christophe Dessimoz
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,Department of Computer Science, University College London, London WC1E 6BT, UK.,Department of Ecology and Evolution and Center for Integrative Genomics, University of Lausanne, Biophore, Lausanne 1015, Switzerland.,Swiss Institute of Bioinformatics, Biophore, Lausanne 1015, Switzerland
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,UCL Genetics Institute, University College London, London WC1E 6BT, UK
| | - Fritz J Sedlazeck
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
32
|
Stuart T, Eichten SR, Cahn J, Karpievitch YV, Borevitz JO, Lister R. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife 2016; 5. [PMID: 27911260 PMCID: PMC5167521 DOI: 10.7554/elife.20777] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 12/01/2016] [Indexed: 01/09/2023] Open
Abstract
Variation in the presence or absence of transposable elements (TEs) is a major source of genetic variation between individuals. Here, we identified 23,095 TE presence/absence variants between 216 Arabidopsis accessions. Most TE variants were rare, and we find these rare variants associated with local extremes of gene expression and DNA methylation levels within the population. Of the common alleles identified, two thirds were not in linkage disequilibrium with nearby SNPs, implicating these variants as a source of novel genetic diversity. Many common TE variants were associated with significantly altered expression of nearby genes, and a major fraction of inter-accession DNA methylation differences were associated with nearby TE insertions. Overall, this demonstrates that TE variants are a rich source of genetic diversity that likely plays an important role in facilitating epigenomic and transcriptional differences between individuals, and indicates a strong genetic basis for epigenetic variation. DOI:http://dx.doi.org/10.7554/eLife.20777.001
Collapse
Affiliation(s)
- Tim Stuart
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Steven R Eichten
- ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, Australia
| | - Jonathan Cahn
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Yuliya V Karpievitch
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Justin O Borevitz
- ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, Australia
| | - Ryan Lister
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| |
Collapse
|
33
|
Liu B, Gao Y, Wang Y. LAMSA: fast split read alignment with long approximate matches. Bioinformatics 2016; 33:192-201. [DOI: 10.1093/bioinformatics/btw594] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Revised: 07/20/2016] [Accepted: 09/08/2016] [Indexed: 12/20/2022] Open
|
34
|
Eichten SR, Stuart T, Srivastava A, Lister R, Borevitz JO. DNA methylation profiles of diverse Brachypodium distachyon align with underlying genetic diversity. Genome Res 2016; 26:1520-1531. [PMID: 27613611 PMCID: PMC5088594 DOI: 10.1101/gr.205468.116] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 08/25/2016] [Indexed: 12/13/2022]
Abstract
DNA methylation, a common modification of genomic DNA, is known to influence the expression of transposable elements as well as some genes. Although commonly viewed as an epigenetic mark, evidence has shown that underlying genetic variation, such as transposable element polymorphisms, often associate with differential DNA methylation states. To investigate the role of DNA methylation variation, transposable element polymorphism, and genomic diversity, whole-genome bisulfite sequencing was performed on genetically diverse lines of the model cereal Brachypodium distachyon. Although DNA methylation profiles are broadly similar, thousands of differentially methylated regions are observed between lines. An analysis of novel transposable element indel variation highlighted hundreds of new polymorphisms not seen in the reference sequence. DNA methylation and transposable element variation is correlated with the genome-wide amount of genetic variation present between samples. However, there was minimal evidence that novel transposon insertions or deletions are associated with nearby differential methylation. This study highlights unique relationships between genetic variation and DNA methylation variation within Brachypodium and provides a valuable map of DNA methylation across diverse resequenced accessions of this model cereal species.
Collapse
Affiliation(s)
- Steven R Eichten
- ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, Australia, 2601
| | - Tim Stuart
- ARC Centre of Excellence in Plant Energy Biology, University of Western Australia, Perth, Australia, 6009
| | - Akanksha Srivastava
- ARC Centre of Excellence in Plant Energy Biology, University of Western Australia, Perth, Australia, 6009
| | - Ryan Lister
- ARC Centre of Excellence in Plant Energy Biology, University of Western Australia, Perth, Australia, 6009
| | - Justin O Borevitz
- ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, Australia, 2601
| |
Collapse
|
35
|
Shaver TM, Lehmann BD, Beeler JS, Li CI, Li Z, Jin H, Stricker TP, Shyr Y, Pietenpol JA. Diverse, Biologically Relevant, and Targetable Gene Rearrangements in Triple-Negative Breast Cancer and Other Malignancies. Cancer Res 2016; 76:4850-60. [PMID: 27231203 DOI: 10.1158/0008-5472.can-16-0058] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 05/11/2016] [Indexed: 12/20/2022]
Abstract
Triple-negative breast cancer (TNBC) and other molecularly heterogeneous malignancies present a significant clinical challenge due to a lack of high-frequency "driver" alterations amenable to therapeutic intervention. These cancers often exhibit genomic instability, resulting in chromosomal rearrangements that affect the structure and expression of protein-coding genes. However, identification of these rearrangements remains technically challenging. Using a newly developed approach that quantitatively predicts gene rearrangements in tumor-derived genetic material, we identified and characterized a novel oncogenic fusion involving the MER proto-oncogene tyrosine kinase (MERTK) and discovered a clinical occurrence and cell line model of the targetable FGFR3-TACC3 fusion in TNBC. Expanding our analysis to other malignancies, we identified a diverse array of novel and known hybrid transcripts, including rearrangements between noncoding regions and clinically relevant genes such as ALK, CSF1R, and CD274/PD-L1 The over 1,000 genetic alterations we identified highlight the importance of considering noncoding gene rearrangement partners, and the targetable gene fusions identified in TNBC demonstrate the need to advance gene fusion detection for molecularly heterogeneous cancers. Cancer Res; 76(16); 4850-60. ©2016 AACR.
Collapse
Affiliation(s)
- Timothy M Shaver
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Brian D Lehmann
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee.
| | - J Scott Beeler
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Chung-I Li
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| | - Zhu Li
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Hailing Jin
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Thomas P Stricker
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Yu Shyr
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee. Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jennifer A Pietenpol
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee. Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee.
| |
Collapse
|
36
|
Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing. Nat Protoc 2016; 11:853-71. [PMID: 27031497 DOI: 10.1038/nprot.2016.043] [Citation(s) in RCA: 191] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.
Collapse
|
37
|
Structural variation discovery in the cancer genome using next generation sequencing: computational solutions and perspectives. Oncotarget 2016; 6:5477-89. [PMID: 25849937 PMCID: PMC4467381 DOI: 10.18632/oncotarget.3491] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 02/04/2015] [Indexed: 01/03/2023] Open
Abstract
Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome.
Collapse
|
38
|
Hazen JL, Faust GG, Rodriguez AR, Ferguson WC, Shumilina S, Clark RA, Boland MJ, Martin G, Chubukov P, Tsunemoto RK, Torkamani A, Kupriyanov S, Hall IM, Baldwin KK. The Complete Genome Sequences, Unique Mutational Spectra, and Developmental Potency of Adult Neurons Revealed by Cloning. Neuron 2016; 89:1223-1236. [PMID: 26948891 DOI: 10.1016/j.neuron.2016.02.004] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 12/14/2015] [Accepted: 01/13/2016] [Indexed: 02/07/2023]
Abstract
Somatic mutation in neurons is linked to neurologic disease and implicated in cell-type diversification. However, the origin, extent, and patterns of genomic mutation in neurons remain unknown. We established a nuclear transfer method to clonally amplify the genomes of neurons from adult mice for whole-genome sequencing. Comprehensive mutation detection and independent validation revealed that individual neurons harbor ∼100 unique mutations from all classes but lack recurrent rearrangements. Most neurons contain at least one gene-disrupting mutation and rare (0-2) mobile element insertions. The frequency and gene bias of neuronal mutations differ from other lineages, potentially due to novel mechanisms governing postmitotic mutation. Fertile mice were cloned from several neurons, establishing the compatibility of mutated adult neuronal genomes with reprogramming to pluripotency and development.
Collapse
Affiliation(s)
- Jennifer L Hazen
- Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla CA 92037, USA
| | - Gregory G Faust
- Department of Biochemistry and Molecular Genetics, 1340 Jefferson Park Ave, University of Virginia School of Medicine, Charlottesville, VA 22901, USA
| | - Alberto R Rodriguez
- Mouse Genetics Core Facility, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - William C Ferguson
- Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla CA 92037, USA
| | - Svetlana Shumilina
- Department of Biochemistry and Molecular Genetics, 1340 Jefferson Park Ave, University of Virginia School of Medicine, Charlottesville, VA 22901, USA
| | - Royden A Clark
- Department of Biochemistry and Molecular Genetics, 1340 Jefferson Park Ave, University of Virginia School of Medicine, Charlottesville, VA 22901, USA
| | - Michael J Boland
- Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla CA 92037, USA
| | - Greg Martin
- Mouse Genetics Core Facility, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - Pavel Chubukov
- Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla CA 92037, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - Rachel K Tsunemoto
- Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla CA 92037, USA.,Neuroscience Graduate Program, 9500 Gilman Drive, University of California San Diego, La Jolla, California, USA
| | - Ali Torkamani
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - Sergey Kupriyanov
- Mouse Genetics Core Facility, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, 4444 Forest Park Ave, St. Louis, MO 63108, USA.,Department of Medicine, Washington University School of Medicine, 660 S Euclid Ave, St. Louis, MO 63110, USA
| | - Kristin K Baldwin
- Department of Molecular and Cellular Neuroscience, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla CA 92037, USA.,Neuroscience Graduate Program, 9500 Gilman Drive, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
39
|
Guan P, Sung WK. Structural variation detection using next-generation sequencing data: A comparative technical review. Methods 2016; 102:36-49. [PMID: 26845461 DOI: 10.1016/j.ymeth.2016.01.020] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2015] [Revised: 01/09/2016] [Accepted: 01/31/2016] [Indexed: 12/11/2022] Open
Abstract
Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies.
Collapse
Affiliation(s)
- Peiyong Guan
- School of Computing, National University of Singapore, 117543, Singapore
| | - Wing-Kin Sung
- School of Computing, National University of Singapore, 117543, Singapore; Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore.
| |
Collapse
|
40
|
Zhang Z, Wang J, Luo J, Ding X, Zhong J, Wang J, Wu FX, Pan Y. Sprites: detection of deletions from sequencing data by re-aligning split reads. Bioinformatics 2016; 32:1788-96. [PMID: 26833342 DOI: 10.1093/bioinformatics/btw053] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Accepted: 01/25/2016] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging. It is highly appealing to develop sensitive and accurate methods to detect deletions from sequencing data, especially deletions with microhomology and deletions with microinsertion. RESULTS We present a novel method called Sprites (SPlit Read re-alIgnment To dEtect Structural variants) which finds deletions from sequencing data. It aligns a whole soft-clipping read rather than its clipped part to the target sequence, a segment of the reference which is determined by spanning reads, in order to find the longest prefix or suffix of the read that has a match in the target sequence. This alignment aims to solve the problem of deletions with microhomologies and deletions with microinsertions. Using both simulated and real data we show that Sprites performs better on detecting deletions compared with other current methods in terms of F-score. AVAILABILITY AND IMPLEMENTATION Sprites is open source software and freely available at https://github.com/zhangzhen/sprites CONTACT jxwang@mail.csu.edu.cnSupplementary data: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Zhang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China, College of Information and Communication Engineering, Hunan Institute of Science and Technology, Yueyang, 414006, China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Junwei Luo
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Xiaojun Ding
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Jiancheng Zhong
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Jun Wang
- Department of Molecular Physiology & Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada and
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA 30302-4110, USA
| |
Collapse
|
41
|
Orientation-specific joining of AID-initiated DNA breaks promotes antibody class switching. Nature 2015; 525:134-139. [PMID: 26308889 PMCID: PMC4592165 DOI: 10.1038/nature14970] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 07/21/2015] [Indexed: 01/08/2023]
Abstract
During B-cell development, RAG endonuclease cleaves immunoglobulin heavy chain (IgH) V, D, and J gene segments and orchestrates their fusion as deletional events that assemble a V(D)J exon in the same transcriptional orientation as adjacent Cμ constant region exons. In mice, six additional sets of constant region exons (CHs) lie 100-200 kilobases downstream in the same transcriptional orientation as V(D)J and Cμ exons. Long repetitive switch (S) regions precede Cμ and downstream CHs. In mature B cells, class switch recombination (CSR) generates different antibody classes by replacing Cμ with a downstream CH (ref. 2). Activation-induced cytidine deaminase (AID) initiates CSR by promoting deamination lesions within Sμ and a downstream acceptor S region; these lesions are converted into DNA double-strand breaks (DSBs) by general DNA repair factors. Productive CSR must occur in a deletional orientation by joining the upstream end of an Sμ DSB to the downstream end of an acceptor S-region DSB. However, the relative frequency of deletional to inversional CSR junctions has not been measured. Thus, whether orientation-specific joining is a programmed mechanistic feature of CSR as it is for V(D)J recombination and, if so, how this is achieved is unknown. To address this question, we adapt high-throughput genome-wide translocation sequencing into a highly sensitive DSB end-joining assay and apply it to endogenous AID-initiated S-region DSBs in mouse B cells. We show that CSR is programmed to occur in a productive deletional orientation and does so via an unprecedented mechanism that involves in cis Igh organizational features in combination with frequent S-region DSBs initiated by AID. We further implicate ATM-dependent DSB-response factors in enforcing this mechanism and provide an explanation of why CSR is so reliant on the 53BP1 DSB-response factor.
Collapse
|
42
|
Zhuang J, Weng Z. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes. Nucleic Acids Res 2015; 43:8146-56. [PMID: 26283183 PMCID: PMC4787836 DOI: 10.1093/nar/gkv831] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 08/06/2015] [Indexed: 01/03/2023] Open
Abstract
Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs.
Collapse
Affiliation(s)
- Jiali Zhuang
- Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| |
Collapse
|
43
|
Lim JQ, Tennakoon C, Guan P, Sung WK. BatAlign: an incremental method for accurate alignment of sequencing reads. Nucleic Acids Res 2015; 43:e107. [PMID: 26170239 PMCID: PMC4652746 DOI: 10.1093/nar/gkv533] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 05/09/2015] [Indexed: 11/12/2022] Open
Abstract
Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called 'Reverse-Alignment' and 'Deep-Scan' to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.
Collapse
Affiliation(s)
- Jing-Quan Lim
- Department of Computer Science, National University of Singapore, Singapore 117417 Laboratory of Cancer Epigenome, Division of Medical Sciences, National Cancer Centre Singapore, Singapore 169610
| | - Chandana Tennakoon
- Department of Computer Science, National University of Singapore, Singapore 117417 NUS Graduate School for Integrative Sciences and Engineering, (CeLS), #05-01, 28 Medical Drive, Singapore 117456 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672 UAE University, PO Box 17551, Al Ain, UAE
| | - Peiyong Guan
- Department of Computer Science, National University of Singapore, Singapore 117417
| | - Wing-Kin Sung
- Department of Computer Science, National University of Singapore, Singapore 117417 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672
| |
Collapse
|
44
|
Zhao H, Zhao F. BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection. Nucleic Acids Res 2015; 43:6701-13. [PMID: 26117537 PMCID: PMC4538813 DOI: 10.1093/nar/gkv605] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/28/2015] [Indexed: 11/18/2022] Open
Abstract
Although recent developed algorithms have integrated multiple signals to improve sensitivity for insertion and deletion (INDEL) detection, they are far from being perfect and still have great limitations in detecting a full size range of INDELs. Here we present BreakSeek, a novel breakpoint-based algorithm, which can unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations. Comprehensive evaluations on both simulated and real datasets revealed that BreakSeek outperformed other existing methods on both sensitivity and specificity in detecting both small and large INDELs, and uncovered a significant amount of novel INDELs that were missed before. In addition, by incorporating sophisticated statistic models, we for the first time investigated and demonstrated the importance of handling false and conflicting signals for multi-signal integrated methods.
Collapse
Affiliation(s)
- Hui Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
45
|
Vyverman M, Baets BD, Fack V, Dawyndt P. A Long Fragment Aligner called ALFALFA. BMC Bioinformatics 2015; 16:159. [PMID: 25971785 PMCID: PMC4449525 DOI: 10.1186/s12859-015-0533-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 03/09/2015] [Indexed: 12/31/2022] Open
Abstract
Background Rapid evolutions in sequencing technology force read mappers into flexible adaptation to longer reads, changing error models, memory barriers and novel applications. Results ALFALFA achieves a high performance in accurately mapping long single-end and paired-end reads to gigabase-scale reference genomes, while remaining competitive for mapping shorter reads. Its seed-and-extend workflow is underpinned by fast retrieval of super-maximal exact matches from an enhanced sparse suffix array, with flexible parameter tuning to balance performance, memory footprint and accuracy. Conclusions ALFALFA is open source and available at http://alfalfa.ugent.be. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0533-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michaël Vyverman
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281 Building S9, Ghent, B-9000, Belgium.
| | - Bernard De Baets
- Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure links 653, Ghent, B-9000, Belgium.
| | - Veerle Fack
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281 Building S9, Ghent, B-9000, Belgium.
| | - Peter Dawyndt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281 Building S9, Ghent, B-9000, Belgium.
| |
Collapse
|
46
|
Frock RL, Hu J, Meyers RM, Ho YJ, Kii E, Alt FW. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat Biotechnol 2015; 33:179-86. [PMID: 25503383 PMCID: PMC4320661 DOI: 10.1038/nbt.3101] [Citation(s) in RCA: 515] [Impact Index Per Article: 51.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 11/14/2014] [Indexed: 12/14/2022]
Abstract
Although great progress has been made in the characterization of the off-target effects of engineered nucleases, sensitive and unbiased genome-wide methods for the detection of off-target cleavage events and potential collateral damage are still lacking. Here we describe a linear amplification-mediated modification of a previously published high-throughput, genome-wide, translocation sequencing (HTGTS) method that robustly detects DNA double-stranded breaks (DSBs) generated by engineered nucleases across the human genome based on their translocation to other endogenous or ectopic DSBs. HTGTS with different Cas9:sgRNA or TALEN nucleases revealed off-target hotspot numbers for given nucleases that ranged from a few or none to dozens or more, and extended the number of known off-targets for certain previously characterized nucleases more than tenfold. We also identified translocations between bona fide nuclease targets on homologous chromosomes, an undesired collateral effect that has not been described previously. Finally, HTGTS confirmed that the Cas9D10A paired nickase approach suppresses off-target cleavage genome-wide.
Collapse
Affiliation(s)
- Richard L. Frock
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Howard Hughes Medical Institute, Boston, Massachusetts, USA
| | - Jiazhi Hu
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Howard Hughes Medical Institute, Boston, Massachusetts, USA
| | - Robin M. Meyers
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Howard Hughes Medical Institute, Boston, Massachusetts, USA
| | - Yu-Jui Ho
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Howard Hughes Medical Institute, Boston, Massachusetts, USA
| | - Erina Kii
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Howard Hughes Medical Institute, Boston, Massachusetts, USA
| | - Frederick W. Alt
- Program in Cellular and Molecular Medicine, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Howard Hughes Medical Institute, Boston, Massachusetts, USA
| |
Collapse
|
47
|
Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 2014; 15:R84. [PMID: 24970577 PMCID: PMC4197822 DOI: 10.1186/gb-2014-15-6-r84] [Citation(s) in RCA: 1015] [Impact Index Per Article: 92.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 06/26/2014] [Indexed: 12/30/2022] Open
Abstract
Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.
Collapse
|
48
|
English AC, Salerno WJ, Reid JG. PBHoney: identifying genomic variants via long-read discordance and interrupted mapping. BMC Bioinformatics 2014; 15:180. [PMID: 24915764 PMCID: PMC4082283 DOI: 10.1186/1471-2105-15-180] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 06/04/2014] [Indexed: 11/25/2022] Open
Abstract
Background As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. Results We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. Conclusions Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli’s circular genome.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center at Baylor College of Medicine, One Baylor Plaza, Houston 77030, Texas, USA.
| | | | | |
Collapse
|
49
|
Abstract
MOTIVATION Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. RESULTS We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results. AVAILABILITY AND IMPLEMENTATION SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.
Collapse
Affiliation(s)
- Gregory G Faust
- Department of Biochemistry and Molecular Genetics and Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Ira M Hall
- Department of Biochemistry and Molecular Genetics and Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA Department of Biochemistry and Molecular Genetics and Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
50
|
Malhotra A, Lindberg M, Faust GG, Leibowitz ML, Clark RA, Layer RM, Quinlan AR, Hall IM. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res 2013; 23:762-76. [PMID: 23410887 PMCID: PMC3638133 DOI: 10.1101/gr.143677.112] [Citation(s) in RCA: 135] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Tumor genomes are generally thought to evolve through a gradual accumulation of mutations, but the observation that extraordinarily complex rearrangements can arise through single mutational events suggests that evolution may be accelerated by punctuated changes in genome architecture. To assess the prevalence and origins of complex genomic rearrangements (CGRs), we mapped 6179 somatic structural variation breakpoints in 64 cancer genomes from seven tumor types and screened for clusters of three or more interconnected breakpoints. We find that complex breakpoint clusters are extremely common: 154 clusters comprise 25% of all somatic breakpoints, and 75% of tumors exhibit at least one complex cluster. Based on copy number state profiling, 63% of breakpoint clusters are consistent with being CGRs that arose through a single mutational event. CGRs have diverse architectures including focal breakpoint clusters, large-scale rearrangements joining clusters from one or more chromosomes, and staggeringly complex chromothripsis events. Notably, chromothripsis has a significantly higher incidence in glioblastoma samples (39%) relative to other tumor types (9%). Chromothripsis breakpoints also show significantly elevated intra-tumor allele frequencies relative to simple SVs, which indicates that they arise early during tumorigenesis or confer selective advantage. Finally, assembly and analysis of 4002 somatic and 6982 germline breakpoint sequences reveal that somatic breakpoints show significantly less microhomology and fewer templated insertions than germline breakpoints, and this effect is stronger at CGRs than at simple variants. These results are inconsistent with replication-based models of CGR genesis and strongly argue that nonhomologous repair of concurrently arising DNA double-strand breaks is the predominant mechanism underlying complex cancer genome rearrangements.
Collapse
Affiliation(s)
- Ankit Malhotra
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia 22903, USA
| | | | | | | | | | | | | | | |
Collapse
|