1
|
Kosugi S, Terao C. Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data. Hum Genome Var 2024; 11:18. [PMID: 38632226 PMCID: PMC11024196 DOI: 10.1038/s41439-024-00276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/19/2024] Open
Abstract
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Center for Genome Informatics, Research Organization of Information and Systems, Joint Support-Center for Data Science Research, Shizuoka, Japan.
- Advanced Genomics Center, National Institute of Genetics, Shizuoka, Japan.
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan.
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
- The Department of Applied Genetics, The School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
| |
Collapse
|
2
|
Wang N, Khan S, Elo LL. VarSCAT: A computational tool for sequence context annotations of genomic variants. PLoS Comput Biol 2023; 19:e1010727. [PMID: 37566612 PMCID: PMC10446208 DOI: 10.1371/journal.pcbi.1010727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 08/23/2023] [Accepted: 07/20/2023] [Indexed: 08/13/2023] Open
Abstract
The sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from https://github.com/elolab/VarSCAT.
Collapse
Affiliation(s)
- Ning Wang
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- InFLAMES Research Flagship Center, University of Turku, Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Laura L. Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- InFLAMES Research Flagship Center, University of Turku, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| |
Collapse
|
3
|
Identification of the Telomere elongation Mutation in Drosophila. Cells 2022; 11:cells11213484. [DOI: 10.3390/cells11213484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 10/31/2022] [Accepted: 11/01/2022] [Indexed: 11/06/2022] Open
Abstract
Telomeres in Drosophila melanogaster, which have inspired a large part of Sergio Pimpinelli work, are similar to those of other eukaryotes in terms of their function. Yet, their length maintenance relies on the transposition of the specialized retrotransposons Het-A, TART, and TAHRE, rather than on the activity of the enzyme telomerase as it occurs in most other eukaryotic organisms. The length of the telomeres in Drosophila thus depends on the number of copies of these transposable elements. Our previous work has led to the isolation of a dominant mutation, Tel1, that caused a several-fold elongation of telomeres. In this study, we molecularly identified the Tel1 mutation by a combination of transposon-induced, site-specific recombination and next-generation sequencing. Recombination located Tel1 to a 15 kb region in 92A. Comparison of the DNA sequence in this region with the Drosophila Genetic Reference Panel of wild-type genomic sequences delimited Tel1 to a 3 bp deletion inside intron 8 of Ino80. Furthermore, CRISPR/Cas9-induced deletions surrounding the same region exhibited the Tel1 telomere phenotype, confirming a strict requirement of this intron 8 gene sequence for a proper regulation of Drosophila telomere length.
Collapse
|
4
|
Zanti M, Michailidou K, Loizidou MA, Machattou C, Pirpa P, Christodoulou K, Spyrou GM, Kyriacou K, Hadjisavvas A. Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels. BMC Bioinformatics 2021; 22:218. [PMID: 33910496 PMCID: PMC8080428 DOI: 10.1186/s12859-021-04144-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 04/15/2021] [Indexed: 11/10/2022] Open
Abstract
Background Next-generation sequencing (NGS) represents a significant advancement in clinical genetics. However, its use creates several technical, data interpretation and management challenges. It is essential to follow a consistent data analysis pipeline to achieve the highest possible accuracy and avoid false variant calls. Herein, we aimed to compare the performance of twenty-eight combinations of NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM, Bowtie2, Stampy), variant calling (GATK-HaplotypeCaller, GATK-UnifiedGenotyper, SAMtools) and interval padding (null, 50 bp, 100 bp) methods, along with a commercially available pipeline (BWA Enrichment, Illumina®). Fourteen germline DNA samples from breast cancer patients were sequenced using a targeted NGS panel approach and subjected to data analysis. Results We highlight that interval padding is required for the accurate detection of intronic variants including spliceogenic pathogenic variants (PVs). In addition, using nearly default parameters, the BWA Enrichment algorithm, failed to detect these spliceogenic PVs and a missense PV in the TP53 gene. We also recommend the BWA-MEM algorithm for sequence alignment, whereas variant calling should be performed using a combination of variant calling algorithms; GATK-HaplotypeCaller and SAMtools for the accurate detection of insertions/deletions and GATK-UnifiedGenotyper for the efficient detection of single nucleotide variant calls. Conclusions These findings have important implications towards the identification of clinically actionable variants through panel testing in a clinical laboratory setting, when dedicated bioinformatics personnel might not always be available. The results also reveal the necessity of improving the existing tools and/or at the same time developing new pipelines to generate more reliable and more consistent data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04144-1.
Collapse
Affiliation(s)
- Maria Zanti
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus.,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Kyriaki Michailidou
- Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Biostatistics Unit, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Maria A Loizidou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus.,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus
| | - Christina Machattou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Panagiota Pirpa
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Kyproula Christodoulou
- Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Neurogenetics Department, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - George M Spyrou
- Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.,Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus
| | - Kyriacos Kyriacou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus.,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus
| | - Andreas Hadjisavvas
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, 2371, Nicosia, Cyprus. .,Cyprus School of Molecular Medicine, 2371, Nicosia, Cyprus.
| |
Collapse
|
5
|
Sadler B, Wilborn J, Antunes L, Kuensting T, Hale AT, Gannon SR, McCall K, Cruchaga C, Harms M, Voisin N, Reymond A, Cappuccio G, Brunetti-Pierri N, Tartaglia M, Niceta M, Leoni C, Zampino G, Ashley-Koch A, Urbizu A, Garrett ME, Soldano K, Macaya A, Conrad D, Strahle J, Dobbs MB, Turner TN, Shannon CN, Brockmeyer D, Limbrick DD, Gurnett CA, Haller G. Rare and de novo coding variants in chromodomain genes in Chiari I malformation. Am J Hum Genet 2021; 108:100-114. [PMID: 33352116 PMCID: PMC7820723 DOI: 10.1016/j.ajhg.2020.12.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 11/24/2020] [Indexed: 12/16/2022] Open
Abstract
Chiari I malformation (CM1), the displacement of the cerebellum through the foramen magnum into the spinal canal, is one of the most common pediatric neurological conditions. Individuals with CM1 can present with neurological symptoms, including severe headaches and sensory or motor deficits, often as a consequence of brainstem compression or syringomyelia (SM). We conducted whole-exome sequencing (WES) on 668 CM1 probands and 232 family members and performed gene-burden and de novo enrichment analyses. A significant enrichment of rare and de novo non-synonymous variants in chromodomain (CHD) genes was observed among individuals with CM1 (combined p = 2.4 × 10-10), including 3 de novo loss-of-function variants in CHD8 (LOF enrichment p = 1.9 × 10-10) and a significant burden of rare transmitted variants in CHD3 (p = 1.8 × 10-6). Overall, individuals with CM1 were found to have significantly increased head circumference (p = 2.6 × 10-9), with many harboring CHD rare variants having macrocephaly. Finally, haploinsufficiency for chd8 in zebrafish led to macrocephaly and posterior hindbrain displacement reminiscent of CM1. These results implicate chromodomain genes and excessive brain growth in CM1 pathogenesis.
Collapse
Affiliation(s)
- Brooke Sadler
- Department of Pediatrics, Washington University, St. Louis, MO 63110, USA
| | - Jackson Wilborn
- Department of Neurosurgery, Washington University, St. Louis, MO 63110, USA
| | - Lilian Antunes
- Department of Orthopaedic Surgery, Washington University, St. Louis, MO 63110, USA
| | - Timothy Kuensting
- Department of Neurosurgery, Washington University, St. Louis, MO 63110, USA
| | - Andrew T Hale
- Division of Genetic Medicine, Vanderbilt University Medical Center & Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Stephen R Gannon
- Division of Pediatric Neurosurgery and Surgical Outcomes Center for Kids, Monroe Carell Jr. Children's Hospital of Vanderbilt University, Nashville, TN 37232, USA
| | - Kevin McCall
- Department of Orthopaedic Surgery, Washington University, St. Louis, MO 63110, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University, St. Louis, MO 63110, USA
| | - Matthew Harms
- Department of Neurology, Columbia University, New York, NY 10027, USA
| | - Norine Voisin
- Center for Integrative Genomics (CIG), University of Lausanne, Lausanne 1015, Switzerland
| | - Alexandre Reymond
- Center for Integrative Genomics (CIG), University of Lausanne, Lausanne 1015, Switzerland
| | - Gerarda Cappuccio
- Department of Translational Medicine, Section of Pediatrics, Federico II University, Naples 80138, Italy; Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli 80078, Italy
| | - Nicola Brunetti-Pierri
- Department of Translational Medicine, Section of Pediatrics, Federico II University, Naples 80138, Italy; Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli 80078, Italy
| | - Marco Tartaglia
- Genetics and Rare Diseases Research Division, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome 00165, Italy
| | - Marcello Niceta
- Genetics and Rare Diseases Research Division, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome 00165, Italy
| | - Chiara Leoni
- Center for Rare Diseases and Birth Defects, Department of Woman and Child Health and Public Health, Fondazione-Policlinico-Universitario-A. Gemelli-IRCCS, Rome 00168, Italy
| | - Giuseppe Zampino
- Center for Rare Diseases and Birth Defects, Department of Woman and Child Health and Public Health, Fondazione-Policlinico-Universitario-A. Gemelli-IRCCS, Rome 00168, Italy
| | - Allison Ashley-Koch
- Duke Molecular Physiology Institute, Department of Medicine, Duke University, Durham, NC 27708, USA
| | - Aintzane Urbizu
- Duke Molecular Physiology Institute, Department of Medicine, Duke University, Durham, NC 27708, USA
| | - Melanie E Garrett
- Duke Molecular Physiology Institute, Department of Medicine, Duke University, Durham, NC 27708, USA
| | - Karen Soldano
- Duke Molecular Physiology Institute, Department of Medicine, Duke University, Durham, NC 27708, USA
| | - Alfons Macaya
- Pediatric Neurology Research group, University Hospital Vall d'Hebron, Barcelona 08035, Spain
| | - Donald Conrad
- Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA
| | - Jennifer Strahle
- Department of Neurosurgery, Washington University, St. Louis, MO 63110, USA
| | - Matthew B Dobbs
- Department of Orthopaedic Surgery, Washington University, St. Louis, MO 63110, USA; Shriners Hospital for Children, St. Louis, MO 63110, USA
| | - Tychele N Turner
- Department of Genetics, Washington University, St. Louis, MO 63110, USA
| | - Chevis N Shannon
- Division of Genetic Medicine, Vanderbilt University Medical Center & Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Douglas Brockmeyer
- Department of Neurological Surgery, University of Utah, Primary Children's Hospital, Salt Lake City, UT 84113, USA
| | - David D Limbrick
- Department of Neurosurgery, Washington University, St. Louis, MO 63110, USA
| | - Christina A Gurnett
- Department of Pediatrics, Washington University, St. Louis, MO 63110, USA; Department of Orthopaedic Surgery, Washington University, St. Louis, MO 63110, USA; Department of Neurology, Washington University, St. Louis, MO 63110, USA
| | - Gabe Haller
- Department of Neurosurgery, Washington University, St. Louis, MO 63110, USA; Department of Neurology, Washington University, St. Louis, MO 63110, USA; Department of Genetics, Washington University, St. Louis, MO 63110, USA.
| |
Collapse
|
6
|
Chen J, Guo JT. Comparative assessments of indel annotations in healthy and cancer genomes with next-generation sequencing data. BMC Med Genomics 2020; 13:170. [PMID: 33167946 PMCID: PMC7653722 DOI: 10.1186/s12920-020-00818-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/29/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Insertion and deletion (indel) is one of the major variation types in human genomes. Accurate annotation of indels is of paramount importance in genetic variation analysis and investigation of their roles in human diseases. Previous studies revealed a high number of false positives from existing indel calling methods, which limits downstream analyses of the effects of indels on both healthy and disease genomes. In this study, we evaluated seven commonly used general indel calling programs for germline indels and four somatic indel calling programs through comparative analysis to investigate their common features and differences and to explore ways to improve indel annotation accuracy. METHODS In our comparative analysis, we adopted a more stringent evaluation approach by considering both the indel positions and the indel types (insertion or deletion sequences) between the samples and the reference set. In addition, we applied an efficient way to use a benchmark for improved performance comparisons for the general indel calling programs RESULTS: We found that germline indels in healthy genomes derived by combining several indel calling tools could help remove a large number of false positive indels from individual programs without compromising the number of true positives. The performance comparisons of somatic indel calling programs are more complicated due to the lack of a reliable and comprehensive benchmark. Nevertheless our results revealed large variations among the programs and among cancer types. CONCLUSIONS While more accurate indel calling programs are needed, we found that the performance for germline indel annotations can be improved by combining the results from several programs. In addition, well-designed benchmarks for both germline and somatic indels are key in program development and evaluations.
Collapse
Affiliation(s)
- Jing Chen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
| |
Collapse
|
7
|
Surface LE, Burrow DT, Li J, Park J, Kumar S, Lyu C, Song N, Yu Z, Rajagopal A, Bae Y, Lee BH, Mumm S, Gu CC, Baker JC, Mohseni M, Sum M, Huskey M, Duan S, Bijanki VN, Civitelli R, Gardner MJ, McAndrew CM, Ricci WM, Gurnett CA, Diemer K, Wan F, Costantino CL, Shannon KM, Raje N, Dodson TB, Haber DA, Carette JE, Varadarajan M, Brummelkamp TR, Birsoy K, Sabatini DM, Haller G, Peterson TR. ATRAID regulates the action of nitrogen-containing bisphosphonates on bone. Sci Transl Med 2020; 12:eaav9166. [PMID: 32434850 PMCID: PMC7882121 DOI: 10.1126/scitranslmed.aav9166] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 01/28/2020] [Accepted: 04/29/2020] [Indexed: 11/02/2022]
Abstract
Nitrogen-containing bisphosphonates (N-BPs), such as alendronate, are the most widely prescribed medications for diseases involving bone, with nearly 200 million prescriptions written annually. Recently, widespread use of N-BPs has been challenged due to the risk of rare but traumatic side effects such as atypical femoral fracture (AFF) and osteonecrosis of the jaw (ONJ). N-BPs bind to and inhibit farnesyl diphosphate synthase, resulting in defects in protein prenylation. Yet, it remains poorly understood what other cellular factors might allow N-BPs to exert their pharmacological effects. Here, we performed genome-wide studies in cells and patients to identify the poorly characterized gene, ATRAID Loss of ATRAID function results in selective resistance to N-BP-mediated loss of cell viability and the prevention of alendronate-mediated inhibition of prenylation. ATRAID is required for alendronate inhibition of osteoclast function, and ATRAID-deficient mice have impaired therapeutic responses to alendronate in both postmenopausal and senile (old age) osteoporosis models. Last, we performed exome sequencing on patients taking N-BPs that suffered ONJ or an AFF. ATRAID is one of three genes that contain rare nonsynonymous coding variants in patients with ONJ or an AFF that is also differentially expressed in poor outcome groups of patients treated with N-BPs. We functionally validated this patient variation in ATRAID as conferring cellular hypersensitivity to N-BPs. Our work adds key insight into the mechanistic action of N-BPs and the processes that might underlie differential responsiveness to N-BPs in people.
Collapse
Affiliation(s)
- Lauren E Surface
- Department of Molecular and Cellular Biology, Department of Chemistry and Chemical Biology, Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA
| | - Damon T Burrow
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Jinmei Li
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Jiwoong Park
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Sandeep Kumar
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Cheng Lyu
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Niki Song
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Zhou Yu
- Department of Molecular and Cellular Biology, Department of Chemistry and Chemical Biology, Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA
| | - Abbhirami Rajagopal
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yangjin Bae
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Brendan H Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Steven Mumm
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
- Center for Metabolic Bone Disease and Molecular Research, Shriners Hospital for Children, St. Louis, MO 63110, USA
| | - Charles C Gu
- Division of Biostatistics, Washington University School of Medicine, 660 S. Euclid Ave., Campus Box 8067, St. Louis, MO 63110, USA
| | - Jonathan C Baker
- Mallinckrodt Institute of Radiology, Washington University School of Medicine, 510 S. Kingshighway Blvd., St. Louis, MO 63110, USA
| | - Mahshid Mohseni
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Melissa Sum
- Division of Endocrinology, Diabetes and Metabolism, NYU Langone Health, 530 1st Ave., Schwartz 5E., New York, NY 10016, USA
| | - Margaret Huskey
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Shenghui Duan
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Vinieth N Bijanki
- Center for Metabolic Bone Disease and Molecular Research, Shriners Hospital for Children, St. Louis, MO 63110, USA
| | - Roberto Civitelli
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Michael J Gardner
- Department of Orthopedic Surgery, Stanford University, 450 Broadway Street, Redwood City, CA 94063, USA
| | - Chris M McAndrew
- Department of Orthopedic Surgery, Washington University School of Medicine, 4938 Parkview Place, St. Louis, MO 63110, USA
| | - William M Ricci
- Hospital for Special Surgery Main Campus-Belaire Building, 525 East 71st Street 2nd Floor, New York, NY 10021, USA
| | - Christina A Gurnett
- Department of Orthopedic Surgery, Washington University School of Medicine, 4938 Parkview Place, St. Louis, MO 63110, USA
- Department of Neurology, Washington University School of Medicine, Campus Box 8111, 660 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Kathryn Diemer
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Fei Wan
- Department of Surgery, Washington University School of Medicine, Campus Box 8109, 4590 Children's Place, Suite 9600, St. Louis, MO 63110, USA
| | - Christina L Costantino
- Massachusetts General Hospital Cancer Center and Department of Surgery, Harvard Medical School, Boston, MA 02114, USA
| | - Kristen M Shannon
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA 02114, USA
| | - Noopur Raje
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA 02114, USA
| | - Thomas B Dodson
- Department of Oral and Maxillofacial Surgery, Massachusetts General Hospital and Harvard School of Dental Medicine, Boston, MA 02114, USA
| | - Daniel A Haber
- Massachusetts General Hospital Cancer Center, Harvard Medical School, Boston, MA 02114, USA
- Howard Hughes Medical Institute (HHMI), Chevy Chase, MD 20815, USA
| | - Jan E Carette
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Malini Varadarajan
- Oncology Disease Area, Novartis Institutes for BioMedical Research, Cambridge, CA 02140, USA
| | - Thijn R Brummelkamp
- Oncode Institute, Division of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX Amsterdam, Netherlands
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Cancer Genomics Center, Plesmanlaan 121, 1066CX Amsterdam, Netherlands
| | - Kivanc Birsoy
- The Rockefeller University, 1230 York Ave., New York, NY 10065, USA
| | - David M Sabatini
- Howard Hughes Medical Institute (HHMI), Chevy Chase, MD 20815, USA
- Whitehead Institute, 9 Cambridge Center, Cambridge, MA 02139, USA
- Department of Biology, Massachusetts Institute of Technology (MIT), 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- David H. Koch Center for Integrative Cancer Research at MIT, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Gabe Haller
- Department of Neurology, Washington University School of Medicine, Campus Box 8111, 660 S. Euclid Ave., St. Louis, MO 63110, USA
- Department of Neurosurgery, Washington University School of Medicine, Campus Box 8057, 660 S. Euclid Ave., St. Louis, MO 63110, USA
| | - Timothy R Peterson
- Division of Bone and Mineral Diseases, Department of Medicine, Washington University School of Medicine, BJC Institute of Health, 425 S. Euclid Ave., St. Louis, MO 63110, USA.
- Department of Genetics, Washington University School of Medicine, 4515 McKinley Ave. Campus Box 8232, St. Louis, MO 63110, USA
- Institute for Public Health, Washington University School of Medicine, 600 S. Taylor Suite 2400, Campus Box 8217, St. Louis, MO 63110, USA
| |
Collapse
|
8
|
Hypermutator Pseudomonas aeruginosa Exploits Multiple Genetic Pathways To Develop Multidrug Resistance during Long-Term Infections in the Airways of Cystic Fibrosis Patients. Antimicrob Agents Chemother 2020; 64:AAC.02142-19. [PMID: 32071060 DOI: 10.1128/aac.02142-19] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 12/20/2019] [Indexed: 12/30/2022] Open
Abstract
Pseudomonas aeruginosa exploits intrinsic and acquired resistance mechanisms to resist almost every antibiotic used in chemotherapy. Antimicrobial resistance in P. aeruginosa isolates recovered from cystic fibrosis (CF) patients is further enhanced by the occurrence of hypermutator strains, a hallmark of chronic infections in CF patients. However, the within-patient genetic diversity of P. aeruginosa populations related to antibiotic resistance remains unexplored. Here, we show the evolution of the mutational resistome profile of a P. aeruginosa hypermutator lineage by performing longitudinal and transversal analyses of isolates collected from a CF patient throughout 20 years of chronic infection. Our results show the accumulation of thousands of mutations, with an overall evolutionary history characterized by purifying selection. However, mutations in antibiotic resistance genes appear to have been positively selected, driven by antibiotic treatment. Antibiotic resistance increased as infection progressed toward the establishment of a population constituted by genotypically diversified coexisting sublineages, all of which converged to multidrug resistance. These sublineages emerged by parallel evolution through distinct evolutionary pathways, which affected genes of the same functional categories. Interestingly, ampC and ftsI, encoding the β-lactamase and penicillin-binding protein 3, respectively, were found to be among the most frequently mutated genes. In fact, both genes were targeted by multiple independent mutational events, which led to a wide diversity of coexisting alleles underlying β-lactam resistance. Our findings indicate that hypermutators, apart from boosting antibiotic resistance evolution by simultaneously targeting several genes, favor the emergence of adaptive innovative alleles by clustering beneficial/compensatory mutations in the same gene, hence expanding P. aeruginosa strategies for persistence.
Collapse
|
9
|
Abstract
Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.
Collapse
|
10
|
Song X, Haghighi A, Iliuta IA, Pei Y. Molecular diagnosis of autosomal dominant polycystic kidney disease. Expert Rev Mol Diagn 2017; 17:885-895. [PMID: 28724316 DOI: 10.1080/14737159.2017.1358088] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
INTRODUCTION Autosomal dominant polycystic kidney disease (ADPKD) is the most common inherited kidney disease that accounts for 5-10% of end-stage renal disease in developed countries. Mutations in PKD1 and PKD2 account for a majority of cases. Mutation screening of PKD1 is technically challenging largely due to the complexity resulting from duplication of its first 33 exons in six highly homologous pseudogenes (i.e. PKD1P1-P6). Protocol using locus-specific long-range and nested PCR has enabled comprehensive PKD1 mutation screening but is labor-intensive and costly. Here, the authors review how recent advances in Next Generation Sequencing are poised to transform and extend molecular diagnosis of ADPKD. Areas covered: Key original research articles and reviews of the topic published in English identified through PubMed from 1957-2017. Expert commentary: The authors review current and evolving approaches using targeted resequencing or whole genome sequencing for screening typical as well as challenging cases (e.g. cases with no detectable PKD1 and PKD2 mutations which may be due to somatic mosaicism or other cystic disease; and complex genetics such as bilineal disease).
Collapse
Affiliation(s)
- Xuewen Song
- a Division of Nephrology , University Health Network and University of Toronto , Toronto , ON , Canada
| | - Amirreza Haghighi
- a Division of Nephrology , University Health Network and University of Toronto , Toronto , ON , Canada
| | - Ioan-Andrei Iliuta
- a Division of Nephrology , University Health Network and University of Toronto , Toronto , ON , Canada
| | - York Pei
- a Division of Nephrology , University Health Network and University of Toronto , Toronto , ON , Canada
| |
Collapse
|
11
|
Wu SH, Schwartz RS, Winter DJ, Conrad DF, Cartwright RA. Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Steven H Wu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rachel S Schwartz
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Department of Biological Sciences, The University of Rhode Island, Kingston, RI, USA
| | - David J Winter
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Donald F Conrad
- Department of Genetics, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
12
|
Witzel M, Petersheim D, Fan Y, Bahrami E, Racek T, Rohlfs M, Puchałka J, Mertes C, Gagneur J, Ziegenhain C, Enard W, Stray-Pedersen A, Arkwright PD, Abboud MR, Pazhakh V, Lieschke GJ, Krawitz PM, Dahlhoff M, Schneider MR, Wolf E, Horny HP, Schmidt H, Schäffer AA, Klein C. Chromatin-remodeling factor SMARCD2 regulates transcriptional networks controlling differentiation of neutrophil granulocytes. Nat Genet 2017; 49:742-752. [PMID: 28369036 DOI: 10.1038/ng.3833] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 03/10/2017] [Indexed: 02/06/2023]
Abstract
We identify SMARCD2 (SWI/SNF-related, matrix-associated, actin-dependent regulator of chromatin, subfamily D, member 2), also known as BAF60b (BRG1/Brahma-associated factor 60b), as a critical regulator of myeloid differentiation in humans, mice, and zebrafish. Studying patients from three unrelated pedigrees characterized by neutropenia, specific granule deficiency, myelodysplasia with excess of blast cells, and various developmental aberrations, we identified three homozygous loss-of-function mutations in SMARCD2. Using mice and zebrafish as model systems, we showed that SMARCD2 controls early steps in the differentiation of myeloid-erythroid progenitor cells. In vitro, SMARCD2 interacts with the transcription factor CEBPɛ and controls expression of neutrophil proteins stored in specific granules. Defective expression of SMARCD2 leads to transcriptional and chromatin changes in acute myeloid leukemia (AML) human promyelocytic cells. In summary, SMARCD2 is a key factor controlling myelopoiesis and is a potential tumor suppressor in leukemia.
Collapse
Affiliation(s)
- Maximilian Witzel
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Daniel Petersheim
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Yanxin Fan
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Ehsan Bahrami
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Tomas Racek
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Meino Rohlfs
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jacek Puchałka
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Christian Mertes
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Julien Gagneur
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany.,Department of Informatics, Technical University of Munich, Munich, Germany
| | - Christoph Ziegenhain
- Anthropology and Human Genomics, Department of Biology II, Faculty of Biology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Department of Biology II, Faculty of Biology, Ludwig-Maximilians-Universität München, Munich, Germany
| | | | - Peter D Arkwright
- Department of Paediatric Allergy and Immunology, University of Manchester, Royal Manchester Children's Hospital, Manchester, UK
| | - Miguel R Abboud
- Department of Pediatrics and Adolescent Medicine, American University of Beirut Medical Center, Beirut, Lebanon
| | - Vahid Pazhakh
- Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Graham J Lieschke
- Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Peter M Krawitz
- Medical Genetics and Human Genetic, Charite University Hospital, Berlin, Germany
| | - Maik Dahlhoff
- Molecular Animal Breeding and Biotechnology, Gene Center Ludwig-Maximilians-Universität München, Munich, Germany
| | - Marlon R Schneider
- Molecular Animal Breeding and Biotechnology, Gene Center Ludwig-Maximilians-Universität München, Munich, Germany
| | - Eckhard Wolf
- Molecular Animal Breeding and Biotechnology, Gene Center Ludwig-Maximilians-Universität München, Munich, Germany
| | - Hans-Peter Horny
- Pathology Institute, Faculty of Medicine, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Heinrich Schmidt
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Alejandro A Schäffer
- National Center for Biotechnology Information, US National Institutes of Health, US Department of Health and Human Services, Bethesda, Maryland, USA
| | - Christoph Klein
- Department of Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilians-Universität München, Munich, Germany.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
13
|
Han Y, He X. Integrating Epigenomics into the Understanding of Biomedical Insight. Bioinform Biol Insights 2016; 10:267-289. [PMID: 27980397 PMCID: PMC5138066 DOI: 10.4137/bbi.s38427] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Revised: 11/01/2016] [Accepted: 11/06/2016] [Indexed: 12/13/2022] Open
Abstract
Epigenetics is one of the most rapidly expanding fields in biomedical research, and the popularity of the high-throughput next-generation sequencing (NGS) highlights the accelerating speed of epigenomics discovery over the past decade. Epigenetics studies the heritable phenotypes resulting from chromatin changes but without alteration on DNA sequence. Epigenetic factors and their interactive network regulate almost all of the fundamental biological procedures, and incorrect epigenetic information may lead to complex diseases. A comprehensive understanding of epigenetic mechanisms, their interactions, and alterations in health and diseases genome widely has become a priority in biological research. Bioinformatics is expected to make a remarkable contribution for this purpose, especially in processing and interpreting the large-scale NGS datasets. In this review, we introduce the epigenetics pioneering achievements in health status and complex diseases; next, we give a systematic review of the epigenomics data generation, summarize public resources and integrative analysis approaches, and finally outline the challenges and future directions in computational epigenomics.
Collapse
Affiliation(s)
- Yixing Han
- Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA.; Present address: Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Ximiao He
- Laboratory of Metabolism, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.; Present address: Department of Medical Genetics, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
14
|
Fu CY, Liu WG, Liu DL, Li JH, Zhu MS, Liao YL, Liu ZR, Zeng XQ, Wang F. Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B as revealed by whole genome re-sequencing. Genome 2016; 59:197-207. [PMID: 26926666 DOI: 10.1139/gen-2015-0101] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Next-generation sequencing technologies provide opportunities to further understand genetic variation, even within closely related cultivars. We performed whole genome resequencing of two elite indica rice varieties, RGD-7S and Taifeng B, whose F1 progeny showed hybrid weakness and hybrid vigor when grown in the early- and late-cropping seasons, respectively. Approximately 150 million 100-bp pair-end reads were generated, which covered ∼86% of the rice (Oryza sativa L. japonica 'Nipponbare') reference genome. A total of 2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified 961,791 SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S, 1989 of 2727 CNVs were overlapped in 218 genes, and 1231 of 2010 CNVs were annotated in 175 genes in Taifeng B. In addition, we verified a subset of InDels in the interval of hybrid weakness genes, Hw3 and Hw4, and obtained some polymorphic InDel markers, which will provide a sound foundation for cloning hybrid weakness genes. Analysis of genomic variations will also contribute to understanding the genetic basis of hybrid weakness and heterosis.
Collapse
Affiliation(s)
- Chong-Yun Fu
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Wu-Ge Liu
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Di-Lin Liu
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Ji-Hua Li
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Man-Shan Zhu
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Yi-Long Liao
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Zhen-Rong Liu
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Xue-Qin Zeng
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| | - Feng Wang
- a Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, P.R. China.,b Guangdong Provincial Key Laboratory of New Technology in Rice Breeding, Guangzhou 510640, P.R. China
| |
Collapse
|
15
|
Replication Errors Made During Oogenesis Lead to Detectable De Novo mtDNA Mutations in Zebrafish Oocytes with a Low mtDNA Copy Number. Genetics 2016; 204:1423-1431. [PMID: 27770035 DOI: 10.1534/genetics.116.194035] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 10/13/2016] [Indexed: 01/30/2023] Open
Abstract
Of all pathogenic mitochondrial DNA (mtDNA) mutations in humans, ∼25% is de novo, although the occurrence in oocytes has never been directly assessed. We used next-generation sequencing to detect point mutations directly in the mtDNA of 3-15 individual mature oocytes and three somatic tissues from eight zebrafish females. Various statistical and biological filters allowed reliable detection of de novo variants with heteroplasmy ≥1.5%. In total, we detected 38 de novo base substitutions, but no insertions or deletions. These 38 de novo mutations were present in 19 of 103 mature oocytes, indicating that ∼20% of the mature oocytes carry at least one de novo mutation with heteroplasmy ≥1.5%. This frequency of de novo mutations is close to that deducted from the reported error rate of polymerase gamma, the mitochondrial replication enzyme, implying that mtDNA replication errors made during oogenesis are a likely explanation. Substantial variation in the mutation prevalence among mature oocytes can be explained by the highly variable mtDNA copy number, since we previously reported that ∼20% of the primordial germ cells have a mtDNA copy number of ≤73 and would lead to detectable mutation loads. In conclusion, replication errors made during oogenesis are an important source of de novo mtDNA base substitutions and their location and heteroplasmy level determine their significance.
Collapse
|
16
|
Jochumsen N, Marvig RL, Damkiær S, Jensen RL, Paulander W, Molin S, Jelsbak L, Folkesson A. The evolution of antimicrobial peptide resistance in Pseudomonas aeruginosa is shaped by strong epistatic interactions. Nat Commun 2016; 7:13002. [PMID: 27694971 PMCID: PMC5494192 DOI: 10.1038/ncomms13002] [Citation(s) in RCA: 99] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 08/24/2016] [Indexed: 11/25/2022] Open
Abstract
Colistin is an antimicrobial peptide that has become the only remaining alternative for the treatment of multidrug-resistant Gram-negative bacterial infections, but little is known of how clinical levels of colistin resistance evolve. We use in vitro experimental evolution and whole-genome sequencing of colistin-resistant Pseudomonas aeruginosa isolates from cystic fibrosis patients to reconstruct the molecular evolutionary pathways open for high-level colistin resistance. We show that the evolution of resistance is a complex, multistep process that requires mutation in at least five independent loci that synergistically create the phenotype. Strong intergenic epistasis limits the number of possible evolutionary pathways to resistance. Mutations in transcriptional regulators are essential for resistance evolution and function as nodes that potentiate further evolution towards higher resistance by functionalizing and increasing the effect of the other mutations. These results add to our understanding of clinical antimicrobial peptide resistance and the prediction of resistance evolution. Colistin is an antibiotic used in the treatment of Pseudomonas aeruginosa infections in cystic fibrosis patients. Here, Jochumsen et al. reconstruct the pathways for the molecular evolution of colistin resistance in P. aeruginosa and show that the number of pathways is highly constrained by interactions among genes.
Collapse
Affiliation(s)
- Nicholas Jochumsen
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Rasmus L Marvig
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark.,Center for Genomic Medicine, Rigshospitalet, 2100 Copenhagen, Denmark
| | - Søren Damkiær
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Rune Lyngklip Jensen
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Wilhelm Paulander
- Department of Veterinary Disease Biology, University of Copenhagen, 1870 Frederiksberg C, Denmark
| | - Søren Molin
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Lars Jelsbak
- Department of Systems Biology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Anders Folkesson
- National Veterinary Institute, Technical University of Denmark, Frederiksberg, Denmark
| |
Collapse
|
17
|
Sommer LM, Marvig RL, Luján A, Koza A, Pressler T, Molin S, Johansen HK. Is genotyping of single isolates sufficient for population structure analysis of Pseudomonas aeruginosa in cystic fibrosis airways? BMC Genomics 2016; 17:589. [PMID: 27506816 PMCID: PMC4979127 DOI: 10.1186/s12864-016-2873-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 06/30/2016] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The primary cause of morbidity and mortality in cystic fibrosis (CF) patients is lung infection by Pseudomonas aeruginosa. Therefore much work has been done to understand the adaptation and evolution of P. aeruginosa in the CF lung. However, many of these studies have focused on longitudinally collected single isolates, and only few have included cross-sectional analyses of entire P. aeruginosa populations in sputum samples. To date only few studies have used the approach of metagenomic analysis for the purpose of investigating P. aeruginosa populations in CF airways. RESULTS We analysed five metagenomes together with longitudinally collected single isolates from four recently chronically infected CF patients. With this approach we were able to link the clone type and the majority of SNP profiles of the single isolates to that of the metagenome(s) for each individual patient. CONCLUSION Based on our analysis we find that when having access to comprehensive collections of longitudinal single isolates it is possible to rediscover the genotypes of the single isolates in the metagenomic samples. This suggests that information gained from genome sequencing of comprehensive collections of single isolates is satisfactory for many investigations of adaptation and evolution of P. aeruginosa to the CF airways.
Collapse
Affiliation(s)
- Lea M Sommer
- The Technical University of Denmark, Center for Biosustainability, Hørsholm, Denmark.,Rigshospitalet, Department of Clinical Microbiology, Copenhagen, Denmark
| | - Rasmus L Marvig
- Rigshospitalet, Department of Clinical Microbiology, Copenhagen, Denmark.,Rigshospitalet, Center for Genomic Medicine, Copenhagen, Denmark
| | | | - Anna Koza
- The Technical University of Denmark, Center for Biosustainability, Hørsholm, Denmark
| | | | - Søren Molin
- The Technical University of Denmark, Center for Biosustainability, Hørsholm, Denmark.,The Technical University of Denmark, Department of Systems Biology, Lyngby, Denmark
| | - Helle K Johansen
- The Technical University of Denmark, Center for Biosustainability, Hørsholm, Denmark. .,Rigshospitalet, Department of Clinical Microbiology, Copenhagen, Denmark.
| |
Collapse
|
18
|
Wajnberg G, Passetti F. Using high-throughput sequencing transcriptome data for INDEL detection: challenges for cancer drug discovery. Expert Opin Drug Discov 2016; 11:257-68. [PMID: 26787005 DOI: 10.1517/17460441.2016.1143813] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION A cancer cell is a mosaic of genomic and epigenomic alterations. Distinct cancer molecular signatures can be observed depending on tumor type or patient genetic background. One type of genomic alteration is the insertion and/or deletion (INDEL) of nucleotides in the DNA sequence, which may vary in length, and may change the encoded protein or modify protein domains. INDELs are associated to a large number of diseases and their detection is done based on low-throughput techniques. However, high-throughput sequencing has also started to be used for detection of novel disease-causing INDELs. This search may identify novel drug targets. AREAS COVERED This review presents examples of using high-throughput sequencing (DNA-Seq and RNA-Seq) to investigate the incidence of INDELs in coding regions of human genes. Some of these examples successfully utilized RNA-Seq to identify INDELs associated to diseases. In addition, other studies have described small INDELs related to chemo-resistance or poor outcome of patients, while structural variants were associated with a better clinical outcome. EXPERT OPINION On average, there is twice as much RNA-Seq data available at the most used repositories for such data compared to DNA-Seq. Therefore, using RNA-Seq data is a promising strategy for studying cancer samples with unknown mechanisms of drug resistance, aiming at the discovery of proteins with potential as novel drug targets.
Collapse
Affiliation(s)
- Gabriel Wajnberg
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| | - Fabio Passetti
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| |
Collapse
|
19
|
A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. BIOMED RESEARCH INTERNATIONAL 2015; 2015:456479. [PMID: 26539496 PMCID: PMC4619817 DOI: 10.1155/2015/456479] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Accepted: 12/17/2014] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves.
Collapse
|
20
|
Pastukh V, Roberts JT, Clark DW, Bardwell GC, Patel M, Al-Mehdi AB, Borchert GM, Gillespie MN. An oxidative DNA "damage" and repair mechanism localized in the VEGF promoter is important for hypoxia-induced VEGF mRNA expression. Am J Physiol Lung Cell Mol Physiol 2015; 309:L1367-75. [PMID: 26432868 DOI: 10.1152/ajplung.00236.2015] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 09/28/2015] [Indexed: 11/22/2022] Open
Abstract
In hypoxia, mitochondria-generated reactive oxygen species not only stimulate accumulation of the transcriptional regulator of hypoxic gene expression, hypoxia inducible factor-1 (Hif-1), but also cause oxidative base modifications in hypoxic response elements (HREs) of hypoxia-inducible genes. When the hypoxia-induced base modifications are suppressed, Hif-1 fails to associate with the HRE of the VEGF promoter, and VEGF mRNA accumulation is blunted. The mechanism linking base modifications to transcription is unknown. Here we determined whether recruitment of base excision DNA repair (BER) enzymes in response to hypoxia-induced promoter modifications was required for transcription complex assembly and VEGF mRNA expression. Using chromatin immunoprecipitation analyses in pulmonary artery endothelial cells, we found that hypoxia-mediated formation of the base oxidation product 8-oxoguanine (8-oxoG) in VEGF HREs was temporally associated with binding of Hif-1α and the BER enzymes 8-oxoguanine glycosylase 1 (Ogg1) and redox effector factor-1 (Ref-1)/apurinic/apyrimidinic endonuclease 1 (Ape1) and introduction of DNA strand breaks. Hif-1α colocalized with HRE sequences harboring Ref-1/Ape1, but not Ogg1. Inhibition of BER by small interfering RNA-mediated reduction in Ogg1 augmented hypoxia-induced 8-oxoG accumulation and attenuated Hif-1α and Ref-1/Ape1 binding to VEGF HRE sequences and blunted VEGF mRNA expression. Chromatin immunoprecipitation-sequence analysis of 8-oxoG distribution in hypoxic pulmonary artery endothelial cells showed that most of the oxidized base was localized to promoters with virtually no overlap between normoxic and hypoxic data sets. Transcription of genes whose promoters lost 8-oxoG during hypoxia was reduced, while those gaining 8-oxoG was elevated. Collectively, these findings suggest that the BER pathway links hypoxia-induced introduction of oxidative DNA modifications in promoters of hypoxia-inducible genes to transcriptional activation.
Collapse
Affiliation(s)
- Viktor Pastukh
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and
| | - Justin T Roberts
- Department of Biology, College of Arts and Sciences, University of South Alabama, Mobile, Alabama
| | - David W Clark
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and
| | - Gina C Bardwell
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and
| | - Mita Patel
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and
| | - Abu-Bakr Al-Mehdi
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and
| | - Glen M Borchert
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and Department of Biology, College of Arts and Sciences, University of South Alabama, Mobile, Alabama
| | - Mark N Gillespie
- Department of Pharmacology and Center for Lung Biology, University of South Alabama College of Medicine, Mobile, Alabama; and
| |
Collapse
|
21
|
Hasan MS, Wu X, Zhang L. Performance evaluation of indel calling tools using real short-read data. Hum Genomics 2015; 9:20. [PMID: 26286629 PMCID: PMC4545535 DOI: 10.1186/s40246-015-0042-2] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 07/20/2015] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Insertion and deletion (indel), a common form of genetic variation, has been shown to cause or contribute to human genetic diseases and cancer. With the advance of next-generation sequencing technology, many indel calling tools have been developed; however, evaluation and comparison of these tools using large-scale real data are still scant. Here we evaluated seven popular and publicly available indel calling tools, GATK Unified Genotyper, VarScan, Pindel, SAMtools, Dindel, GTAK HaplotypeCaller, and Platypus, using 78 human genome low-coverage data from the 1000 Genomes project. RESULTS Comparing indels called by these tools with a known set of indels, we found that Platypus outperforms other tools. In addition, a high percentage of known indels still remain undetected and the number of common indels called by all seven tools is very low. CONCLUSION All these findings indicate the necessity of improving the existing tools or developing new algorithms to achieve reliable and consistent indel calling results.
Collapse
Affiliation(s)
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA.
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.
| |
Collapse
|
22
|
Boschiero C, Gheyas AA, Ralph HK, Eory L, Paton B, Kuo R, Fulton J, Preisinger R, Kaiser P, Burt DW. Detection and characterization of small insertion and deletion genetic variants in modern layer chicken genomes. BMC Genomics 2015; 16:562. [PMID: 26227840 PMCID: PMC4563830 DOI: 10.1186/s12864-015-1711-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 06/22/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Small insertions and deletions (InDels) constitute the second most abundant class of genetic variants and have been found to be associated with many traits and diseases. The present study reports on the detection and characterisation of about 883 K high quality InDels from the whole-genome analysis of several modern layer chicken lines from diverse breeds. RESULTS To reduce the error rates seen in InDel detection, this study used the consensus set from two InDel-calling packages: SAMtools and Dindel, as well as stringent post-filtering criteria. By analysing sequence data from 163 chickens from 11 commercial and 5 experimental layer lines, this study detected about 883 K high quality consensus InDels with 93% validation rate and an average density of 0.78 InDels/kb over the genome. Certain chromosomes, viz, GGAZ, 16, 22 and 25 showed very low densities of InDels whereas the highest rate was observed on GGA6. In spite of the higher recombination rates on microchromosomes, the InDel density on these chromosomes was generally lower relative to macrochromosomes possibly due to their higher gene density. About 43-87% of the InDels were found to be fixed within each line. The majority of detected InDels (86%) were 1-5 bases and about 63% were non-repetitive in nature while the rest were tandem repeats of various motif types. Functional annotation identified 613 frameshift, 465 non-frameshift and 10 stop-gain/loss InDels. Apart from the frameshift and stopgain/loss InDels that are expected to affect the translation of protein sequences and their biological activity, 33% of the non-frameshift were predicted as evolutionary intolerant with potential impact on protein functions. Moreover, about 2.5% of the InDels coincided with the most-conserved elements previously mapped on the chicken genome and are likely to define functional elements. InDels potentially affecting protein function were found to be enriched for certain gene-classes e.g. those associated with cell proliferation, chromosome and Golgi organization, spermatogenesis, and muscle contraction. CONCLUSIONS The large catalogue of InDels presented in this study along with their associated information such as functional annotation, estimated allele frequency, etc. are expected to serve as a rich resource for application in future research and breeding in the chicken.
Collapse
Affiliation(s)
- Clarissa Boschiero
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK. .,Current Address: Departamento de Zootecnia, University of Sao Paulo/ESALQ, Piracicaba, SP, 13418-900, Brazil.
| | - Almas A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Hannah K Ralph
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Lel Eory
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Bob Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | | | | | - Pete Kaiser
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
23
|
Jiang Y, Turinsky AL, Brudno M. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection. Nucleic Acids Res 2015; 43:7217-28. [PMID: 26130710 PMCID: PMC4551921 DOI: 10.1093/nar/gkv677] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 06/19/2015] [Indexed: 12/22/2022] Open
Abstract
With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them.
Collapse
Affiliation(s)
- Yue Jiang
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada Center for Biomedical Informatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Andrei L Turinsky
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Michael Brudno
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada Department of Computer Science, University of Toronto, Toronto, ON, M5S 3G4, Canada
| |
Collapse
|
24
|
Wittler R, Marschall T, Schönhuth A, Mäkinen V. Repeat- and error-aware comparison of deletions. Bioinformatics 2015; 31:2947-54. [PMID: 25979471 DOI: 10.1093/bioinformatics/btv304] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 05/08/2015] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION The number of reported genetic variants is rapidly growing, empowered by ever faster accumulation of next-generation sequencing data. A major issue is comparability. Standards that address the combined problem of inaccurately predicted breakpoints and repeat-induced ambiguities are missing. This decisively lowers the quality of 'consensus' callsets and hampers the removal of duplicate entries in variant databases, which can have deleterious effects in downstream analyses. RESULTS We introduce a sound framework for comparison of deletions that captures both tool-induced inaccuracies and repeat-induced ambiguities. We present a maximum matching algorithm that outputs virtual duplicates among two sets of predictions/annotations. We demonstrate that our approach is clearly superior over ad hoc criteria, like overlap, and that it can reduce the redundancy among callsets substantially. We also identify large amounts of duplicate entries in the Database of Genomic Variants, which points out the immediate relevance of our approach. AVAILABILITY AND IMPLEMENTATION Implementation is open source and available from https://bitbucket.org/readdi/readdi CONTACT roland.wittler@uni-bielefeld.de or t.marschall@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Roland Wittler
- Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany, Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland
| | - Tobias Marschall
- Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany, Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland
| | - Alexander Schönhuth
- Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany, Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland
| | - Veli Mäkinen
- Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany, Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland
| |
Collapse
|
25
|
Wu Z, Tembrock LR, Ge S. Are differences in genomic data sets due to true biological variants or errors in genome assembly: an example from two chloroplast genomes. PLoS One 2015; 10:e0118019. [PMID: 25658309 PMCID: PMC4320078 DOI: 10.1371/journal.pone.0118019] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 01/07/2015] [Indexed: 01/01/2023] Open
Abstract
DNA sequencing has been revolutionized by the development of high-throughput sequencing technologies. Plummeting costs and the massive throughput capacities of second and third generation sequencing platforms have transformed many fields of biological research. Concurrently, new data processing pipelines made rapid de novo genome assemblies possible. However, high quality data are critically important for all investigations in the genomic era. We used chloroplast genomes of one Oryza species (O. australiensis) to compare differences in sequence quality: one genome (GU592209) was obtained through Illumina sequencing and reference-guided assembly and the other genome (KJ830774) was obtained via target enrichment libraries and shotgun sequencing. Based on the whole genome alignment, GU592209 was more similar to the reference genome (O. sativa: AY522330) with 99.2% sequence identity (SI value) compared with the 98.8% SI values in the KJ830774 genome; whereas the opposite result was obtained when the SI values in coding and noncoding regions of GU592209 and KJ830774 were compared. Additionally, the junctions of two single copies and repeat copies in the chloroplast genome exhibited differences. Phylogenetic analyses were conducted using these sequences, and the different data sets yielded dissimilar topologies: phylogenetic replacements of the two individuals were remarkably different based on whole genome sequencing or SNP data and insertions and deletions (indels) data. Thus, we concluded that the genomic composition of GU592209 was heterogeneous in coding and non-coding regions. These findings should impel biologists to carefully consider the quality of sequencing and assembly when working with next-generation data.
Collapse
Affiliation(s)
- Zhiqiang Wu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
- Department of Biology, Colorado State University, Fort Collins, Colorado, United States of America
| | - Luke R. Tembrock
- Department of Biology, Colorado State University, Fort Collins, Colorado, United States of America
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
26
|
Ghoneim DH, Myers JR, Tuttle E, Paciorkowski AR. Comparison of insertion/deletion calling algorithms on human next-generation sequencing data. BMC Res Notes 2014; 7:864. [PMID: 25435282 PMCID: PMC4265454 DOI: 10.1186/1756-0500-7-864] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 11/21/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). RESULTS We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms. CONCLUSIONS Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths.
Collapse
Affiliation(s)
- Dalia H Ghoneim
- Center for Neural Development and Disease, University of Rochester Medical Center, 601 Elmwood Avenue, Rochester, NY, USA.
| | | | | | | |
Collapse
|
27
|
Coexistence and within-host evolution of diversified lineages of hypermutable Pseudomonas aeruginosa in long-term cystic fibrosis infections. PLoS Genet 2014; 10:e1004651. [PMID: 25330091 PMCID: PMC4199492 DOI: 10.1371/journal.pgen.1004651] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 08/03/2014] [Indexed: 12/14/2022] Open
Abstract
The advent of high-throughput sequencing techniques has made it possible to follow the genomic evolution of pathogenic bacteria by comparing longitudinally collected bacteria sampled from human hosts. Such studies in the context of chronic airway infections by Pseudomonas aeruginosa in cystic fibrosis (CF) patients have indicated high bacterial population diversity. Such diversity may be driven by hypermutability resulting from DNA mismatch repair system (MRS) deficiency, a common trait evolved by P. aeruginosa strains in CF infections. No studies to date have utilized whole-genome sequencing to investigate within-host population diversity or long-term evolution of mutators in CF airways. We sequenced the genomes of 13 and 14 isolates of P. aeruginosa mutator populations from an Argentinian and a Danish CF patient, respectively. Our collection of isolates spanned 6 and 20 years of patient infection history, respectively. We sequenced 11 isolates from a single sample from each patient to allow in-depth analysis of population diversity. Each patient was infected by clonal populations of bacteria that were dominated by mutators. The in vivo mutation rate of the populations was ∼100 SNPs/year–∼40-fold higher than rates in normo-mutable populations. Comparison of the genomes of 11 isolates from the same sample showed extensive within-patient genomic diversification; the populations were composed of different sub-lineages that had coexisted for many years since the initial colonization of the patient. Analysis of the mutations identified genes that underwent convergent evolution across lineages and sub-lineages, suggesting that the genes were targeted by mutation to optimize pathogenic fitness. Parallel evolution was observed in reduction of overall catabolic capacity of the populations. These findings are useful for understanding the evolution of pathogen populations and identifying new targets for control of chronic infections. Patients with cystic fibrosis (CF) are often colonized by a single clone of the common, widespread bacterium Pseudomonas aeruginosa, resulting in chronic airway infections. Long-term persistence of the bacteria involves the emergence and selection of multiple phenotypic variants. Among these are “mutator” variants characterized by increased mutation rates resulting from the inactivation of DNA repair systems. The genetic evolution of mutators during the course of chronic infection is poorly understood, and the effects of hypermutability on bacterial population structure have not been studied using genomic approaches. We evaluated the genomic changes undergone by mutator populations of P. aeruginosa obtained from single sputum samples from two chronically infected CF patients, and found that mutators completely dominated the infecting population in both patients. These populations displayed high genomic diversity based on vast accumulation of stochastic mutations. Our results are in contrast to the concept of a homogeneous population consisting of a single dominant clone; rather, they support a model of populations structured by diverse subpopulations that coexist within the patient. Certain genes involved in adaptation were highly and convergently mutated in both lineages, suggesting that these genes were beneficial and potentially responsible for the co-selection of mutator alleles.
Collapse
|
28
|
Human metapneumovirus infection induces significant changes in small noncoding RNA expression in airway epithelial cells. MOLECULAR THERAPY. NUCLEIC ACIDS 2014; 3:e163. [PMID: 24845106 PMCID: PMC4040629 DOI: 10.1038/mtna.2014.18] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 04/12/2014] [Indexed: 12/14/2022]
Abstract
Small noncoding RNAs (sncRNAs), such as microRNAs (miRNA), virus-derived sncRNAs, and more recently identified tRNA-derived RNA fragments, are critical to posttranscriptional control of genes. Upon viral infection, host cells alter their sncRNA expression as a defense mechanism, while viruses can circumvent host defenses and promote their own propagation by affecting host cellular sncRNA expression or by expressing viral sncRNAs. Therefore, characterizing sncRNA profiles in response to viral infection is an important tool for understanding host–virus interaction, and for antiviral strategy development. Human metapneumovirus (hMPV), a recently identified pathogen, is a major cause of lower respiratory tract infections in infants and children. To investigate whether sncRNAs play a role in hMPV infection, we analyzed the changes in sncRNA profiles of airway epithelial cells in response to hMPV infection using ultrahigh-throughput sequencing. Of the cloned sncRNAs, miRNA was dominant in A549 cells, with the percentage of miRNA increasing in a time-dependent manner after the infection. In addition, several hMPV-derived sncRNAs and corresponding ribonucleases for their biogenesis were identified. hMPV M2-2 protein was revealed to be a key viral protein regulating miRNA expression. In summary, this study revealed several novel aspects of hMPV-mediated sncRNA expression, providing a new perspective on hMPV–host interactions.
Collapse
|
29
|
Within-host evolution of Pseudomonas aeruginosa reveals adaptation toward iron acquisition from hemoglobin. mBio 2014; 5:e00966-14. [PMID: 24803516 PMCID: PMC4010824 DOI: 10.1128/mbio.00966-14] [Citation(s) in RCA: 129] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Pseudomonas aeruginosa airway infections are a major cause of mortality and morbidity of cystic fibrosis (CF) patients. In order to persist, P. aeruginosa depends on acquiring iron from its host, and multiple different iron acquisition systems may be active during infection. This includes the pyoverdine siderophore and the Pseudomonas heme utilization (phu) system. While the regulation and mechanisms of several iron-scavenging systems are well described, it is not clear whether such systems are targets for selection during adaptation of P. aeruginosa to the host environment. Here we investigated the within-host evolution of the transmissible P. aeruginosa DK2 lineage. We found positive selection for promoter mutations leading to increased expression of the phu system. By mimicking conditions of the CF airways in vitro, we experimentally demonstrate that increased expression of phuR confers a growth advantage in the presence of hemoglobin, thus suggesting that P. aeruginosa evolves toward iron acquisition from hemoglobin. To rule out that this adaptive trait is specific to the DK2 lineage, we inspected the genomes of additional P. aeruginosa lineages isolated from CF airways and found similar adaptive evolution in two distinct lineages (DK1 and PA clone C). Furthermore, in all three lineages, phuR promoter mutations coincided with the loss of pyoverdine production, suggesting that within-host adaptation toward heme utilization is triggered by the loss of pyoverdine production. Targeting heme utilization might therefore be a promising strategy for the treatment of P. aeruginosa infections in CF patients. Most bacterial pathogens depend on scavenging iron within their hosts, which makes the battle for iron between pathogens and hosts a hallmark of infection. Accordingly, the ability of the opportunistic pathogen Pseudomonas aeruginosa to cause chronic infections in cystic fibrosis (CF) patients also depends on iron-scavenging systems. While the regulation and mechanisms of several such iron-scavenging systems have been well described, not much is known about how the within-host selection pressures act on the pathogens’ ability to acquire iron. Here, we investigated the within-host evolution of P. aeruginosa, and we found evidence that P. aeruginosa during long-term infections evolves toward iron acquisition from hemoglobin. This adaptive strategy might be due to a selective loss of other iron-scavenging mechanisms and/or an increase in the availability of hemoglobin at the site of infection. This information is relevant to the design of novel CF therapeutics and the development of models of chronic CF infections.
Collapse
|
30
|
Bergström A, Simpson JT, Salinas F, Barré B, Parts L, Zia A, Nguyen Ba AN, Moses AM, Louis EJ, Mustonen V, Warringer J, Durbin R, Liti G. A high-definition view of functional genetic variation from natural yeast genomes. Mol Biol Evol 2014; 31:872-88. [PMID: 24425782 PMCID: PMC3969562 DOI: 10.1093/molbev/msu037] [Citation(s) in RCA: 215] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.
Collapse
Affiliation(s)
- Anders Bergström
- Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France
| | | | - Francisco Salinas
- Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France
| | - Benjamin Barré
- Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France
| | - Leopold Parts
- The Wellcome Trust Sanger Institute, Cambridge, United Kingdom
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Amin Zia
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
- Stanford Center for Genomics and Personalized Medicine, Stanford University School of Medicine
| | - Alex N. Nguyen Ba
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alan M. Moses
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Edward J. Louis
- Centre of Genetic Architecture of Complex Traits, University of Leicester, Leicester, United Kingdom
| | - Ville Mustonen
- The Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Jonas Warringer
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
| | - Richard Durbin
- The Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Gianni Liti
- Institute for Research on Cancer and Ageing, Nice (IRCAN), University of Nice, Nice, France
| |
Collapse
|
31
|
Gschloessl B, Vogel H, Burban C, Heckel D, Streiff R, Kerdelhué C. Comparative analysis of two phenologically divergent populations of the pine processionary moth (Thaumetopoea pityocampa) by de novo transcriptome sequencing. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2014; 46:31-42. [PMID: 24468684 DOI: 10.1016/j.ibmb.2014.01.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Revised: 01/11/2014] [Accepted: 01/13/2014] [Indexed: 06/03/2023]
Abstract
The pine processionary moth Thaumetopoea pityocampa is a Mediterranean lepidopteran defoliator that experiences a rapid range expansion towards higher latitudes and altitudes due to the current climate warming. Its phenology - the time of sexual reproduction - is certainly a key trait for the local adaptation of the processionary moth to climatic conditions. Moreover, an exceptional case of allochronic differentiation was discovered ca. 15 years ago in this species. A population with a shifted phenology (the summer population, SP) co-exists near Leiria, Portugal, with a population following the classical cycle (the winter population, WP). The existence of this population is an outstanding opportunity to decipher the genetic bases of phenology. No genomic resources were so far available for T. pityocampa. We developed a high-throughput sequencing approach to build a first reference transcriptome, and to proceed with comparative analyses of the sympatric SP and WP. We pooled RNA extracted from whole individuals of various developmental stages, and performed a transcriptome characterisation for both populations combining Roche 454-FLX and traditional Sanger data. The obtained sequences were clustered into ca. 12,000 transcripts corresponding to 9265 unigenes. The mean transcript coverage was 21.9 reads per bp. Almost 70% of the de novo assembled transcripts displayed significant similarity to previously published proteins and around 50% of the transcripts contained a full-length coding region. Comparative analyses of the population transcriptomes allowed to investigate genes specifically expressed in one of the studied populations only, and to identify the most divergent homologous SP/WP transcripts. The most divergent pairs of transcripts did not correspond to obvious phenology-related candidate genes, and 43% could not be functionally annotated. This study provides the first comprehensive genome-wide resource for the target species T. pityocampa. Many of the assembled genes are orthologs of published Lepidoptera genes, which allows carrying out gene-specific re-sequencing. Data mining has allowed the identification of SNP loci that will be useful for population genomic approaches and genome-wide scans of population differentiation to identify signatures of selection.
Collapse
Affiliation(s)
- Bernhard Gschloessl
- INRA, UMR CBGP (INRA/IRD/CIRAD/Montpellier Supagro), Campus International de Baillarguet, CS30016, F-34988 Montferrier-sur-Lez Cedex, France.
| | - Heiko Vogel
- Max Planck Institute for Chemical Ecology, Department of Entomology, 07745 Jena, Germany
| | - Christian Burban
- INRA, UMR1202 BIOGECO, 69 Route d'Arcachon, F-33612 Cestas Cedex, France
| | - David Heckel
- Max Planck Institute for Chemical Ecology, Department of Entomology, 07745 Jena, Germany
| | - Réjane Streiff
- INRA, UMR CBGP (INRA/IRD/CIRAD/Montpellier Supagro), Campus International de Baillarguet, CS30016, F-34988 Montferrier-sur-Lez Cedex, France
| | - Carole Kerdelhué
- INRA, UMR CBGP (INRA/IRD/CIRAD/Montpellier Supagro), Campus International de Baillarguet, CS30016, F-34988 Montferrier-sur-Lez Cedex, France
| |
Collapse
|
32
|
Kim TM, Laird PW, Park PJ. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 2014; 155:858-68. [PMID: 24209623 DOI: 10.1016/j.cell.2013.10.015] [Citation(s) in RCA: 291] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 07/11/2013] [Accepted: 10/02/2013] [Indexed: 12/30/2022]
Abstract
Microsatellites-simple tandem repeats present at millions of sites in the human genome-can shorten or lengthen due to a defect in DNA mismatch repair. We present here a comprehensive genome-wide analysis of the prevalence, mutational spectrum, and functional consequences of microsatellite instability (MSI) in cancer genomes. We analyzed MSI in 277 colorectal and endometrial cancer genomes (including 57 microsatellite-unstable ones) using exome and whole-genome sequencing data. Recurrent MSI events in coding sequences showed tumor type specificity, elevated frameshift-to-inframe ratios, and lower transcript levels than wild-type alleles. Moreover, genome-wide analysis revealed differences in the distribution of MSI versus point mutations, including overrepresentation of MSI in euchromatic and intronic regions compared to heterochromatic and intergenic regions, respectively, and depletion of MSI at nucleosome-occupied sequences. Our results provide a panoramic view of MSI in cancer genomes, highlighting their tumor type specificity, impact on gene expression, and the role of chromatin organization.
Collapse
Affiliation(s)
- Tae-Min Kim
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Cancer Evolution Research Center, College of Medicine, The Catholic University of Korea, Seoul 137-701, Korea
| | | | | |
Collapse
|
33
|
Characterization of genetic diversity in the nematode Pristionchus pacificus from population-scale resequencing data. Genetics 2014; 196:1153-65. [PMID: 24443445 DOI: 10.1534/genetics.113.159855] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The hermaphroditic nematode Pristionchus pacificus is an established model system for comparative studies with Caenorhabditis elegans in developmental biology, ecology, and population genetics. In this study, we present whole-genome sequencing data of 104 P. pacificus strains and the draft assembly of the obligate outcrossing sister species P. exspectatus. We characterize genetic diversity within P. pacificus and investigate the population genetic processes shaping this diversity. P. pacificus is 10 times more diverse than C. elegans and exhibits substantial population structure that allows us to probe its evolution on multiple timescales. Consistent with reduced effective recombination in this self-fertilizing species, we find haplotype blocks that span several megabases. Using the P. exspectatus genome as an outgroup, we polarized variation in P. pacificus and found a site frequency spectrum (SFS) that decays more rapidly than expected in neutral models. The SFS at putatively neutral sites is U shaped, which is a characteristic feature of pervasive linked selection. Based on the additional findings (i) that the majority of nonsynonymous variation is eliminated over timescales on the order of the separation between clades, (ii) that diversity is reduced in gene-rich regions, and (iii) that highly differentiated clades show very similar patterns of diversity, we conclude that purifying selection on many mutations with weak effects is a major force shaping genetic diversity in P. pacificus.
Collapse
|
34
|
Tan SYY, Chua SL, Liu Y, Høiby N, Andersen LP, Givskov M, Song Z, Yang L. Comparative genomic analysis of rapid evolution of an extreme-drug-resistant Acinetobacter baumannii clone. Genome Biol Evol 2013; 5:807-18. [PMID: 23538992 PMCID: PMC3673627 DOI: 10.1093/gbe/evt047] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The emergence of extreme-drug-resistant (EDR) bacterial strains in hospital and nonhospital clinical settings is a big and growing public health threat. Understanding the antibiotic resistance mechanisms at the genomic levels can facilitate the development of next-generation agents. Here, comparative genomics has been employed to analyze the rapid evolution of an EDR Acinetobacter baumannii clone from the intensive care unit (ICU) of Rigshospitalet at Copenhagen. Two resistant A. baumannii strains, 48055 and 53264, were sequentially isolated from two individuals who had been admitted to ICU within a 1-month interval. Multilocus sequence typing indicates that these two isolates belonged to ST208. The A. baumannii 53264 strain gained colistin resistance compared with the 48055 strain and became an EDR strain. Genome sequencing indicates that A. baumannii 53264 and 48055 have almost identical genomes-61 single-nucleotide polymorphisms (SNPs) were found between them. The A. baumannii 53264 strain was assembled into 130 contigs, with a total length of 3,976,592 bp with 38.93% GC content. The A. baumannii 48055 strain was assembled into 135 contigs, with a total length of 4,049,562 bp with 39.00% GC content. Genome comparisons showed that this A. baumannii clone is classified as an International clone II strain and has 94% synteny with the A. baumannii ACICU strain. The ResFinder server identified a total of 14 antibiotic resistance genes in the A. baumannii clone. Proteomic analyses revealed that a putative porin protein was down-regulated when A. baumannii 53264 was exposed to antimicrobials, which may reduce the entry of antibiotics into the bacterial cell.
Collapse
Affiliation(s)
- Sean Yang-Yi Tan
- Singapore Centre on Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Sandoval-Espinola WJ, Makwana ST, Chinn MS, Thon MR, Azcárate-Peril MA, Bruno-Bárcena JM. Comparative phenotypic analysis and genome sequence of Clostridium beijerinckii SA-1, an offspring of NCIMB 8052. MICROBIOLOGY (READING, ENGLAND) 2013; 159:2558-2570. [PMID: 24068240 PMCID: PMC7336276 DOI: 10.1099/mic.0.069534-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 09/24/2013] [Indexed: 01/07/2023]
Abstract
Production of butanol by solventogenic clostridia is controlled through metabolic regulation of the carbon flow and limited by its toxic effects. To overcome cell sensitivity to solvents, stress-directed evolution methodology was used three decades ago on Clostridium beijerinckii NCIMB 8052 that spawned the SA-1 strain. Here, we evaluated SA-1 solventogenic capabilities when growing on a previously validated medium containing, as carbon- and energy-limiting substrates, sucrose and the products of its hydrolysis d-glucose and d-fructose and only d-fructose. Comparative small-scale batch fermentations with controlled pH (pH 6.5) showed that SA-1 is a solvent hyper-producing strain capable of generating up to 16.1 g l(-1) of butanol and 26.3 g l(-1) of total solvents, 62.3 % and 63 % more than NCIMB 8052, respectively. This corresponds to butanol and solvent yields of 0.3 and 0.49 g g(-1), respectively (63 % and 65 % increase compared with NCIMB 8052). SA-1 showed a deficiency in d-fructose transport as suggested by its 7 h generation time compared with 1 h for NCIMB 8052. To potentially correlate physiological behaviour with genetic mutations, the whole genome of SA-1 was sequenced using the Illumina GA IIx platform. PCR and Sanger sequencing were performed to analyse the putative variations. As a result, four errors were confirmed and validated in the reference genome of NCIMB 8052 and a total of 10 genetic polymorphisms in SA-1. The genetic polymorphisms included eight single nucleotide variants, one small deletion and one large insertion that it is an additional copy of the insertion sequence ISCb1. Two of the genetic polymorphisms, the serine threonine phosphatase cbs_4400 and the solute binding protein cbs_0769, may possibly explain some of the observed physiological behaviour, such as rerouting of the metabolic carbon flow, deregulation of the d-fructose phosphotransferase transport system and delayed sporulation.
Collapse
Affiliation(s)
| | - Satya T. Makwana
- Department of Microbiology, North Carolina State University, Raleigh, NC 27695-7615, USA
| | - Mari S. Chinn
- Department of Biological and Agricultural Engineering, North Carolina State University, Raleigh, NC 27695-7615, USA
| | - Michael R. Thon
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Departamento de Microbiología y Genética, Universidad de Salamanca, Calle Del Duero 12, Villamayor 37185, Spain
| | - M. Andrea Azcárate-Peril
- Department of Cell Biology and Physiology, School of Medicine and Microbiome Core Facility, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, NC 27599-7545, USA
| | - José M. Bruno-Bárcena
- Department of Microbiology, North Carolina State University, Raleigh, NC 27695-7615, USA
| |
Collapse
|
36
|
The venom gland transcriptome of Latrodectus tredecimguttatus revealed by deep sequencing and cDNA library analysis. PLoS One 2013; 8:e81357. [PMID: 24312294 PMCID: PMC3842942 DOI: 10.1371/journal.pone.0081357] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 10/10/2013] [Indexed: 01/01/2023] Open
Abstract
Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity.
Collapse
|
37
|
Sun L, Zhang Q, Xu Z, Yang W, Guo Y, Lu J, Pan H, Cheng T, Cai M. Genome-wide DNA polymorphisms in two cultivars of mei (Prunus mume sieb. et zucc.). BMC Genet 2013; 14:98. [PMID: 24093913 PMCID: PMC3851432 DOI: 10.1186/1471-2156-14-98] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 09/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background Mei (Prunus mume Sieb. et Zucc.) is a famous ornamental plant and fruit crop grown in East Asian countries. Limited genetic resources, especially molecular markers, have hindered the progress of mei breeding projects. Here, we performed low-depth whole-genome sequencing of Prunus mume ‘Fenban’ and Prunus mume ‘Kouzi Yudie’ to identify high-quality polymorphic markers between the two cultivars on a large scale. Results A total of 1464.1 Mb and 1422.1 Mb of ‘Fenban’ and ‘Kouzi Yudie’ sequencing data were uniquely mapped to the mei reference genome with about 6-fold coverage, respectively. We detected a large number of putative polymorphic markers from the 196.9 Mb of sequencing data shared by the two cultivars, which together contained 200,627 SNPs, 4,900 InDels, and 7,063 SSRs. Among these markers, 38,773 SNPs, 174 InDels, and 418 SSRs were distributed in the 22.4 Mb CDS region, and 63.0% of these marker-containing CDS sequences were assigned to GO terms. Subsequently, 670 selected SNPs were validated using an Agilent’s SureSelect solution phase hybridization assay. A subset of 599 SNPs was used to assess the genetic similarity of a panel of mei germplasm samples and a plum (P. salicina) cultivar, producing a set of informative diversity data. We also analyzed the frequency and distribution of detected InDels and SSRs in mei genome and validated their usefulness as DNA markers. These markers were successfully amplified in the cultivars and in their segregating progeny. Conclusions A large set of high-quality polymorphic SNPs, InDels, and SSRs were identified in parallel between ‘Fenban’ and ‘Kouzi Yudie’ using low-depth whole-genome sequencing. The study presents extensive data on these polymorphic markers, which can be useful for constructing high-resolution genetic maps, performing genome-wide association studies, and designing genomic selection strategies in mei.
Collapse
Affiliation(s)
- Lidan Sun
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation and Molecular Breeding, National Engineering Research Center for Floriculture, College of Landscape Architecture, Beijing Forestry University, 100083 Beijing, P,R, China.
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Marvig RL, Johansen HK, Molin S, Jelsbak L. Genome analysis of a transmissible lineage of pseudomonas aeruginosa reveals pathoadaptive mutations and distinct evolutionary paths of hypermutators. PLoS Genet 2013; 9:e1003741. [PMID: 24039595 PMCID: PMC3764201 DOI: 10.1371/journal.pgen.1003741] [Citation(s) in RCA: 156] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/08/2013] [Indexed: 11/18/2022] Open
Abstract
Genome sequencing of bacterial pathogens has advanced our understanding of their evolution, epidemiology, and response to antibiotic therapy. However, we still have only a limited knowledge of the molecular changes in in vivo evolving bacterial populations in relation to long-term, chronic infections. For example, it remains unclear what genes are mutated to facilitate the establishment of long-term existence in the human host environment, and in which way acquisition of a hypermutator phenotype with enhanced rates of spontaneous mutations influences the evolutionary trajectory of the pathogen. Here we perform a retrospective study of the DK2 clone type of P. aeruginosa isolated from Danish patients suffering from cystic fibrosis (CF), and analyze the genomes of 55 bacterial isolates collected from 21 infected individuals over 38 years. Our phylogenetic analysis of 8,530 mutations in the DK2 genomes shows that the ancestral DK2 clone type spread among CF patients through several independent transmission events. Subsequent to transmission, sub-lineages evolved independently for years in separate hosts, creating a unique possibility to study parallel evolution and identification of genes targeted by mutations to optimize pathogen fitness (pathoadaptive mutations). These genes were related to antibiotic resistance, the cell envelope, or regulatory functions, and we find that the prevalence of pathoadaptive mutations correlates with evolutionary success of co-evolving sub-lineages. The long-term co-existence of both normal and hypermutator populations enabled comparative investigations of the mutation dynamics in homopolymeric sequences in which hypermutators are particularly prone to mutations. We find a positive exponential correlation between the length of the homopolymer and its likelihood to acquire mutations and identify two homopolymer-containing genes preferentially mutated in hypermutators. This homopolymer facilitated differential mutagenesis provides a novel genome-wide perspective on the different evolutionary trajectories of hypermutators, which may help explain their emergence in CF infections. Pseudomonas aeruginosa is the dominating pathogen of chronic airway infections in patients with cystic fibrosis (CF). Although bacterial long-term persistence in CF hosts involves mutation and selection of genetic variants with increased fitness in the CF lung environment, our understanding of the within-host evolutionary processes is limited. Here, we performed a retrospective study of the P. aeruginosa DK2 clone type, which is a transmissible clone isolated from chronically infected Danish CF patients over a period of 38 years. Whole-genome analysis of DK2 isolates enabled a fine-grained reconstruction of the recent evolutionary history of the DK2 lineage and an identification of bacterial genes targeted by mutations to optimize pathogen fitness. The identification of such pathoadaptive genes gives new insight into how the pathogen evolves under the selective pressures of the host immune system and drug therapies. Furthermore, isolates with increased rates of mutation (hypermutator phenotype) emerged in the DK lineage. While this phenotype may accelerate evolution, we also show that hypermutators display differential mutagenesis of certain genes which enable them to follow alternative evolutionary pathways. Overall, our study identifies genes important for bacterial persistence and provides insight into the different mutational mechanisms that govern the adaptive genetic changes.
Collapse
Affiliation(s)
- Rasmus Lykke Marvig
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Helle Krogh Johansen
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- Department of Clinical Microbiology, Rigshospitalet, Copenhagen, Denmark
| | - Søren Molin
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (SM); (LJ)
| | - Lars Jelsbak
- Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (SM); (LJ)
| |
Collapse
|
39
|
Umbarger MA, Kennedy CJ, Saunders P, Breton B, Chennagiri N, Emhoff J, Greger V, Hallam S, Maganzini D, Micale C, Nizzari MM, Towne CF, Church GM, Porreca GJ. Next-generation carrier screening. Genet Med 2013; 16:132-40. [PMID: 23765052 PMCID: PMC3918543 DOI: 10.1038/gim.2013.83] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 05/02/2013] [Indexed: 12/29/2022] Open
Abstract
PURPOSE Carrier screening for recessive Mendelian disorders traditionally employs focused genotyping to interrogate limited sets of mutations most prevalent in specific ethnic groups. We sought to develop a next-generation DNA sequencing-based workflow to enable analysis of a more comprehensive set of disease-causing mutations. METHODS We utilized molecular inversion probes to capture the protein-coding regions of 15 genes from genomic DNA isolated from whole blood and sequenced those regions using the Illumina HiSeq 2000 (Illumina, San Diego, CA). To assess the quality of the resulting data, we measured both the fraction of the targeted region yielding high-quality genotype calls, and the sensitivity and specificity of those calls by comparison with conventional Sanger sequencing across hundreds of samples. Finally, to improve the overall accuracy for detecting insertions and deletions, we introduce a novel assembly-based approach that substantially increases sensitivity without reducing specificity. RESULTS We generated high-quality sequence for at least 99.8% of targeted base pairs in samples derived from blood and achieved high concordance with Sanger sequencing (sensitivity >99.9%, specificity >99.999%). Our novel algorithm is capable of detecting insertions and deletions inaccessible by current methods. CONCLUSION Our next-generation DNA sequencing-based approach yields the accuracy and completeness necessary for a carrier screening test.
Collapse
Affiliation(s)
| | | | | | | | | | - John Emhoff
- Good Start Genetics, Cambridge, Massachusetts, USA
| | | | | | | | | | | | | | - George M Church
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | | |
Collapse
|
40
|
Kodera H, Kato M, Nord AS, Walsh T, Lee M, Yamanaka G, Tohyama J, Nakamura K, Nakagawa E, Ikeda T, Ben-Zeev B, Lev D, Lerman-Sagie T, Straussberg R, Tanabe S, Ueda K, Amamoto M, Ohta S, Nonoda Y, Nishiyama K, Tsurusaki Y, Nakashima M, Miyake N, Hayasaka K, King MC, Matsumoto N, Saitsu H. Targeted capture and sequencing for detection of mutations causing early onset epileptic encephalopathy. Epilepsia 2013; 54:1262-9. [DOI: 10.1111/epi.12203] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2013] [Indexed: 01/08/2023]
Affiliation(s)
- Hirofumi Kodera
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| | - Mitsuhiro Kato
- Department of Pediatrics; Yamagata University Faculty of Medicine; Yamagata Japan
| | - Alex S. Nord
- Department of Genome Sciences and Department of Medicine; University of Washington; Seattle Washington U.S.A
| | - Tom Walsh
- Department of Genome Sciences and Department of Medicine; University of Washington; Seattle Washington U.S.A
| | - Ming Lee
- Department of Genome Sciences and Department of Medicine; University of Washington; Seattle Washington U.S.A
| | - Gaku Yamanaka
- Department of Pediatrics; Tokyo Medical University; Tokyo Japan
| | - Jun Tohyama
- Department of Pediatrics; Nishi-Niigata Chuo National Hospital; Niigata Japan
| | - Kazuyuki Nakamura
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
- Department of Pediatrics; Yamagata University Faculty of Medicine; Yamagata Japan
| | - Eiji Nakagawa
- Department of Child Neurology; National Center of Neurology and Psychiatry; Tokyo Japan
| | - Tae Ikeda
- Division of Pediatric Neurology; Osaka Medical Center and Research Institute for Maternal and Child Health; Osaka Japan
| | - Bruria Ben-Zeev
- The Edmond and Lily Safra Children's Hospital; Sheba Medical Center; Ramat Gan Israel
| | - Dorit Lev
- Metabolic Neurogenetic Clinic; Wolfson Medical Center; Holon Israel
| | | | - Rachel Straussberg
- Department of Neurogenetics; Schneider's Children Medical Center; Petah Tiqwa Israel
| | - Saori Tanabe
- Department of Pediatrics; Nihonkai General Hospital; Sakata Japan
| | | | - Masano Amamoto
- Pediatric Emergency Center; Kitakyusyu City Yahata Hospital; Kitakyushu Japan
| | - Sayaka Ohta
- Department of Pediatrics; Graduate School of Medicine; University of Tokyo; Tokyo Japan
| | - Yutaka Nonoda
- Department of Pediatrics; School of Medicine; Kitasato University; Sagamihara Japan
| | - Kiyomi Nishiyama
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| | - Yoshinori Tsurusaki
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| | - Mitsuko Nakashima
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| | - Noriko Miyake
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| | - Kiyoshi Hayasaka
- Department of Pediatrics; Yamagata University Faculty of Medicine; Yamagata Japan
| | - Mary-Claire King
- Department of Genome Sciences and Department of Medicine; University of Washington; Seattle Washington U.S.A
| | - Naomichi Matsumoto
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| | - Hirotomo Saitsu
- Department of Human Genetics; Yokohama City University Graduate School of Medicine; Yokohama Japan
| |
Collapse
|
41
|
Assmus J, Kleffe J, Schmitt AO, Brockmann GA. Equivalent indels--ambiguous functional classes and redundancy in databases. PLoS One 2013; 8:e62803. [PMID: 23658777 PMCID: PMC3642179 DOI: 10.1371/journal.pone.0062803] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 03/26/2013] [Indexed: 01/09/2023] Open
Abstract
There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence.
Collapse
Affiliation(s)
- Jens Assmus
- Breeding Biology and Molecular Genetics, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jürgen Kleffe
- Institut für Molekularbiologie und Bioinformatik, Charité Berlin, Berlin, Germany
| | - Armin O. Schmitt
- Breeding Biology and Molecular Genetics, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Gudrun A. Brockmann
- Breeding Biology and Molecular Genetics, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
42
|
O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, Bodily P, Tian L, Hakonarson H, Johnson WE, Wei Z, Wang K, Lyon GJ. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 2013; 5:28. [PMID: 23537139 PMCID: PMC3706896 DOI: 10.1186/gm432] [Citation(s) in RCA: 307] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/23/2013] [Accepted: 03/27/2013] [Indexed: 12/18/2022] Open
Abstract
Background To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. Methods We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage. Results SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family. Conclusions Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.
Collapse
Affiliation(s)
- Jason O'Rawe
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA
| | - Tao Jiang
- BGI-Shenzhen, Shenzhen 518000, China
| | | | - Yiyang Wu
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA
| | - Wei Wang
- New Jersey Institute of Technology, Martin Luther King Jr. Blvd, Newark, 07103, USA
| | | | - Paul Bodily
- Brigham Young University, N University Ave, Provo, 84606, USA
| | - Lifeng Tian
- Children's Hospital of Philadelphia, Civic Center Blvd, Philadelphia, 19104, USA
| | - Hakon Hakonarson
- Children's Hospital of Philadelphia, Civic Center Blvd, Philadelphia, 19104, USA
| | - W Evan Johnson
- Boston University School of Medicine, E Concord St, Boston, 02118, USA
| | - Zhi Wei
- New Jersey Institute of Technology, Martin Luther King Jr. Blvd, Newark, 07103, USA
| | - Kai Wang
- University of Southern California, 1501 San Pablo Street, Los Angeles, 90089, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA
| | - Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA
| |
Collapse
|
43
|
Kotlarz D, Ziętara N, Uzel G, Weidemann T, Braun CJ, Diestelhorst J, Krawitz PM, Robinson PN, Hecht J, Puchałka J, Gertz EM, Schäffer AA, Lawrence MG, Kardava L, Pfeifer D, Baumann U, Pfister ED, Hanson EP, Schambach A, Jacobs R, Kreipe H, Moir S, Milner JD, Schwille P, Mundlos S, Klein C. Loss-of-function mutations in the IL-21 receptor gene cause a primary immunodeficiency syndrome. ACTA ACUST UNITED AC 2013; 210:433-43. [PMID: 23440042 PMCID: PMC3600901 DOI: 10.1084/jem.20111229] [Citation(s) in RCA: 155] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
A primary immunodeficiency syndrome caused by loss-of-function mutations in the IL-21 receptor exhibits impaired B, T, and NK cell function. Primary immunodeficiencies (PIDs) represent exquisite models for studying mechanisms of human host defense. In this study, we report on two unrelated kindreds, with two patients each, who had cryptosporidial infections associated with chronic cholangitis and liver disease. Using exome and candidate gene sequencing, we identified two distinct homozygous loss-of-function mutations in the interleukin-21 receptor gene (IL21R; c.G602T, p.Arg201Leu and c.240_245delCTGCCA, p.C81_H82del). The IL-21RArg201Leu mutation causes aberrant trafficking of the IL-21R to the plasma membrane, abrogates IL-21 ligand binding, and leads to defective phosphorylation of signal transducer and activator of transcription 1 (STAT1), STAT3, and STAT5. We observed impaired IL-21–induced proliferation and immunoglobulin class-switching in B cells, cytokine production in T cells, and NK cell cytotoxicity. Our study indicates that human IL-21R deficiency causes an immunodeficiency and highlights the need for early diagnosis and allogeneic hematopoietic stem cell transplantation in affected children.
Collapse
Affiliation(s)
- Daniel Kotlarz
- Department of Pediatric Hematology/Oncology, Hannover Medical School, 30625 Hannover, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Shin H, Liu T, Duan X, Zhang Y, Liu XS. Computational methodology for ChIP-seq analysis. QUANTITATIVE BIOLOGY 2013; 1:54-70. [PMID: 25741452 DOI: 10.1007/s40484-013-0006-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of DNA binding proteins such as transcription factors or modified histones. As more and more experimental laboratories are adopting ChIP-seq to unravel the transcriptional and epigenetic regulatory mechanisms, computational analyses of ChIP-seq also become increasingly comprehensive and sophisticated. In this article, we review current computational methodology for ChIP-seq analysis, recommend useful algorithms and workflows, and introduce quality control measures at different analytical steps. We also discuss how ChIP-seq could be integrated with other types of genomic assays, such as gene expression profiling and genome-wide association studies, to provide a more comprehensive view of gene regulatory mechanisms in important physiological and pathological processes.
Collapse
Affiliation(s)
- Hyunjin Shin
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| | - Tao Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| | - Xikun Duan
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | - Yong Zhang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | - X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
45
|
Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing. ScientificWorldJournal 2013; 2013:730210. [PMID: 23365548 PMCID: PMC3556895 DOI: 10.1155/2013/730210] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Accepted: 11/22/2012] [Indexed: 12/28/2022] Open
Abstract
It has become increasingly apparent that one of the major hurdles in the genomic age will be the bioinformatics challenges of next-generation sequencing. We provide an overview of a general framework of bioinformatics analysis. For each of the three stages of (1) alignment, (2) variant calling, and (3) filtering and annotation, we describe the analysis required and survey the different software packages that are used. Furthermore, we discuss possible future developments as data sources grow and highlight opportunities for new bioinformatics tools to be developed.
Collapse
|
46
|
Zhou X, Bao S, Wang B, Zhang X, Song YQ. Short read mapping for exome sequencing. Methods Mol Biol 2013; 1038:93-111. [PMID: 23872971 DOI: 10.1007/978-1-62703-514-9_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Mapping short reads to the reference genome is very often the prerequisite for applications utilizing the next-generation sequencing technologies. A dozen of software tools developed for this purpose have been widely used. But many practical issues remained when utilizing them to build a computational pipeline for downstream analyses. In this chapter, we describe the read mapping procedures adopted in our lab for the exome sequencing studies as an example to illustrate those practical details.
Collapse
Affiliation(s)
- Xueya Zhou
- Bioinformatics Division, Tsinghua National Laboratory of Information Science and Technology, Beijing, China
| | | | | | | | | |
Collapse
|
47
|
Jimenez-Lopez JC, Gachomo EW, Sharma S, Kotchoni SO. Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/ajmb.2013.32016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
48
|
Lescai F, Bonfiglio S, Bacchelli C, Chanudet E, Waters A, Sisodiya SM, Kasperavičiūtė D, Williams J, Harold D, Hardy J, Kleta R, Cirak S, Williams R, Achermann JC, Anderson J, Kelsell D, Vulliamy T, Houlden H, Wood N, Sheerin U, Tonini GP, Mackay D, Hussain K, Sowden J, Kinsler V, Osinska J, Brooks T, Hubank M, Beales P, Stupka E. Characterisation and validation of insertions and deletions in 173 patient exomes. PLoS One 2012; 7:e51292. [PMID: 23251486 PMCID: PMC3522676 DOI: 10.1371/journal.pone.0051292] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 11/01/2012] [Indexed: 01/01/2023] Open
Abstract
Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated.We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.
Collapse
Affiliation(s)
- Francesco Lescai
- UCL Genomics, University College London, London, United Kingdom
- Division of Research Strategy, University College London, London, United Kingdom
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
| | - Silvia Bonfiglio
- Centre for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, Milan, Italy
| | - Chiara Bacchelli
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Estelle Chanudet
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Aoife Waters
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Sanjay M. Sisodiya
- UCL Institute of Neurology, University College London, London, United Kingdom
| | | | - Julie Williams
- Department of Psychological Medicine, Cardiff University, Cardiff, United Kingdom
| | - Denise Harold
- Department of Psychological Medicine, Cardiff University, Cardiff, United Kingdom
| | - John Hardy
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Robert Kleta
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Sebahattin Cirak
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Richard Williams
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - John C. Achermann
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - John Anderson
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - David Kelsell
- Blizard Institute of Cell and Molecular Science, Barts and The London, London, United Kingdom
| | - Tom Vulliamy
- Blizard Institute of Cell and Molecular Science, Barts and The London, London, United Kingdom
| | - Henry Houlden
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Nicholas Wood
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Una Sheerin
- UCL Institute of Neurology, University College London, London, United Kingdom
| | - Gian Paolo Tonini
- Translational Oncopathology, National Cancer Research Institute (IST), Genova, Italy
| | - Donna Mackay
- Institute of Ophthalmology, University College London, London, United Kingdom
| | - Khalid Hussain
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Jane Sowden
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Veronica Kinsler
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Justyna Osinska
- UCL Genomics, University College London, London, United Kingdom
| | - Tony Brooks
- UCL Genomics, University College London, London, United Kingdom
| | - Mike Hubank
- UCL Genomics, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Philip Beales
- GOSgene, UCL Institute of Child Health, University College London, London, United Kingdom
- UCL Institute of Child Health, University College London, London, United Kingdom
| | - Elia Stupka
- UCL Genomics, University College London, London, United Kingdom
- Centre for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, Milan, Italy
- Cancer Institute, University College London, London, United Kingdom
| |
Collapse
|
49
|
Mutations in PIGO, a member of the GPI-anchor-synthesis pathway, cause hyperphosphatasia with mental retardation. Am J Hum Genet 2012; 91:146-51. [PMID: 22683086 DOI: 10.1016/j.ajhg.2012.05.004] [Citation(s) in RCA: 120] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Revised: 04/19/2012] [Accepted: 05/11/2012] [Indexed: 11/23/2022] Open
Abstract
Hyperphosphatasia with mental retardation syndrome (HPMRS), an autosomal-recessive form of intellectual disability characterized by facial dysmorphism, seizures, brachytelephalangy, and persistent elevated serum alkaline phosphatase (hyperphosphatasia), was recently shown to be caused by mutations in PIGV, a member of the glycosylphosphatidylinositol (GPI)-anchor-synthesis pathway. However, not all individuals with HPMRS harbor mutations in this gene. By exome sequencing, we detected compound-heterozygous mutations in PIGO, a gene coding for a membrane protein of the same molecular pathway, in two siblings with HPMRS, and we then found by Sanger sequencing further mutations in another affected individual; these mutations cosegregated in the investigated families. The mutant transcripts are aberrantly spliced, decrease the membrane stability of the protein, or impair enzyme function such that GPI-anchor synthesis is affected and the level of GPI-anchored substrates localized at the cell surface is reduced. Our data identify PIGO as the second gene associated with HPMRS and suggest that a deficiency in GPI-anchor synthesis is the underlying molecular pathomechanism of HPMRS.
Collapse
|
50
|
Rau MH, Marvig RL, Ehrlich GD, Molin S, Jelsbak L. Deletion and acquisition of genomic content during early stage adaptation of Pseudomonas aeruginosa to a human host environment. Environ Microbiol 2012; 14:2200-11. [PMID: 22672046 DOI: 10.1111/j.1462-2920.2012.02795.x] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Adaptation of bacterial pathogens to a permanently host-associated lifestyle by means of deletion or acquisition of genetic material is usually examined through comparison of present-day isolates to a distant theoretical ancestor. This limits the resolution of the adaptation process. We conducted a retrospective study of the dissemination of the P.aeruginosa DK2 clone type among patients suffering from cystic fibrosis, sequencing the genomes of 45 isolates collected from 16 individuals over 35 years. Analysis of the genomes provides a high-resolution examination of the dynamics and mechanisms of the change in genetic content during the early stage of host adaptation by this P.aeruginosa strain as it adapts to the cystic fibrosis (CF) lung of several patients. Considerable genome reduction is detected predominantly through the deletion of large genomic regions, and up to 8% of the genome is deleted in one isolate. Compared with in vitro estimates the resulting average deletion rates are 12- to 36-fold higher. Deletions occur through both illegitimate and homologous recombination, but they are not IS element mediated as previously reported for early stage host adaptation. Uptake of novel DNA sequences during infection is limited as only one prophage region was putatively inserted in one isolate, demonstrating that early host adaptation is characterized by the reduction of genomic repertoire rather than acquisition of novel functions. Finally, we also describe the complete genome of this highly adapted pathogenic strain of P.aeruginosa to strengthen the genetic basis, which serves to help our understanding of microbial evolution in a natural environment.
Collapse
Affiliation(s)
- Martin H Rau
- Department of Systems Biology, Technical University of Denmark, 2800 Lyngby, Denmark
| | | | | | | | | |
Collapse
|