1
|
Mayer JU, Hilligan KL, Chandler JS, Eccles DA, Old SI, Domingues RG, Yang J, Webb GR, Munoz-Erazo L, Hyde EJ, Wakelin KA, Tang SC, Chappell SC, von Daake S, Brombacher F, Mackay CR, Sher A, Tussiwand R, Connor LM, Gallego-Ortega D, Jankovic D, Le Gros G, Hepworth MR, Lamiable O, Ronchese F. Homeostatic IL-13 in healthy skin directs dendritic cell differentiation to promote T H2 and inhibit T H17 cell polarization. Nat Immunol 2021; 22:1538-1550. [PMID: 34795444 DOI: 10.1038/s41590-021-01067-0] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 10/05/2021] [Indexed: 01/27/2023]
Abstract
The signals driving the adaptation of type 2 dendritic cells (DC2s) to diverse peripheral environments remain mostly undefined. We show that differentiation of CD11blo migratory DC2s-a DC2 population unique to the dermis-required IL-13 signaling dependent on the transcription factors STAT6 and KLF4, whereas DC2s in lung and small intestine were STAT6-independent. Similarly, human DC2s in skin expressed an IL-4 and IL-13 gene signature that was not found in blood, spleen and lung DCs. In mice, IL-13 was secreted homeostatically by dermal innate lymphoid cells and was independent of microbiota, TSLP or IL-33. In the absence of IL-13 signaling, dermal DC2s were stable in number but remained CD11bhi and showed defective activation in response to allergens, with diminished ability to support the development of IL-4+GATA3+ helper T cells (TH), whereas antifungal IL-17+RORγt+ TH cells were increased. Therefore, homeostatic IL-13 fosters a noninflammatory skin environment that supports allergic sensitization.
Collapse
Affiliation(s)
- Johannes U Mayer
- Malaghan Institute of Medical Research, Wellington, New Zealand
- Department of Dermatology and Allergology, Phillips University Marburg, Marburg, Germany
| | - Kerry L Hilligan
- Malaghan Institute of Medical Research, Wellington, New Zealand
- Immunobiology Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | - David A Eccles
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Samuel I Old
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Rita G Domingues
- Lydia Becker Institute of Immunology and Inflammation, Manchester Collaborative Centre for Inflammation Research, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | - Jianping Yang
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Greta R Webb
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | | | - Evelyn J Hyde
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | | | | | | | | | - Frank Brombacher
- International Centre for Genetic Engineering and Biotechnology (ICGEB), Cape Town component & Institute of Infectious Diseases and Molecular Medicine (IDM), Division of Immunology, Health Science Faculty, University of Cape Town, Cape Town, South Africa
| | - Charles R Mackay
- Infection and Immunity Program, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia
| | - Alan Sher
- Immunobiology Section, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Roxane Tussiwand
- Department of Biomedicine, University of Basel, Basel, Switzerland
- Immune Regulation Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Lisa M Connor
- Malaghan Institute of Medical Research, Wellington, New Zealand
- School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
| | - David Gallego-Ortega
- The Kinghorn Cancer Centre, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- Centre for Single-Cell Technology, School of Biomedical Engineering, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, Australia
| | - Dragana Jankovic
- Immunoparasitology Unit, Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Graham Le Gros
- Malaghan Institute of Medical Research, Wellington, New Zealand
| | - Matthew R Hepworth
- Lydia Becker Institute of Immunology and Inflammation, Manchester Collaborative Centre for Inflammation Research, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, UK
| | | | - Franca Ronchese
- Malaghan Institute of Medical Research, Wellington, New Zealand.
| |
Collapse
|
2
|
Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, Jones W, Garg S, Markello C, Lin MF, Paten B, Durbin R. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018; 36:875-879. [PMID: 30125266 PMCID: PMC6126949 DOI: 10.1038/nbt.4227] [Citation(s) in RCA: 353] [Impact Index Per Article: 50.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 07/23/2018] [Indexed: 12/30/2022]
Abstract
Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.
Collapse
Affiliation(s)
- Erik Garrison
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jouni Sirén
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | - Jordan M Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | - Eric T Dawson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- National Cancer Institute, Rockville, Maryland, USA
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - William Jones
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Shilpa Garg
- Max-Planck-Institut für Informatik, Saarbrücken, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA
| | - Richard Durbin
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| |
Collapse
|
3
|
Zhou Z, Lundstrøm I, Tran-Dien A, Duchêne S, Alikhan NF, Sergeant MJ, Langridge G, Fotakis AK, Nair S, Stenøien HK, Hamre SS, Casjens S, Christophersen A, Quince C, Thomson NR, Weill FX, Ho SYW, Gilbert MTP, Achtman M. Pan-genome Analysis of Ancient and Modern Salmonella enterica Demonstrates Genomic Stability of the Invasive Para C Lineage for Millennia. Curr Biol 2018; 28:2420-2428.e10. [PMID: 30033331 PMCID: PMC6089836 DOI: 10.1016/j.cub.2018.05.058] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 02/09/2018] [Accepted: 05/18/2018] [Indexed: 02/04/2023]
Abstract
Salmonella enterica serovar Paratyphi C causes enteric (paratyphoid) fever in humans. Its presentation can range from asymptomatic infections of the blood stream to gastrointestinal or urinary tract infection or even a fatal septicemia [1]. Paratyphi C is very rare in Europe and North America except for occasional travelers from South and East Asia or Africa, where the disease is more common [2, 3]. However, early 20th-century observations in Eastern Europe [3, 4] suggest that Paratyphi C enteric fever may once have had a wide-ranging impact on human societies. Here, we describe a draft Paratyphi C genome (Ragna) recovered from the 800-year-old skeleton (SK152) of a young woman in Trondheim, Norway. Paratyphi C sequences were recovered from her teeth and bones, suggesting that she died of enteric fever and demonstrating that these bacteria have long caused invasive salmonellosis in Europeans. Comparative analyses against modern Salmonella genome sequences revealed that Paratyphi C is a clade within the Para C lineage, which also includes serovars Choleraesuis, Typhisuis, and Lomita. Although Paratyphi C only infects humans, Choleraesuis causes septicemia in pigs and boar [5] (and occasionally humans), and Typhisuis causes epidemic swine salmonellosis (chronic paratyphoid) in domestic pigs [2, 3]. These different host specificities likely evolved in Europe over the last ∼4,000 years since the time of their most recent common ancestor (tMRCA) and are possibly associated with the differential acquisitions of two genomic islands, SPI-6 and SPI-7. The tMRCAs of these bacterial clades coincide with the timing of pig domestication in Europe [6]. Salmonella enterica aDNA sequences were found within 800-year-old teeth and bone The invasive Para C lineage was defined from 50,000 modern S. enterica genomes The Para C lineage includes Ragna, the aDNA genome, and human and swine pathogens Only few genomic changes occurred in the Para C lineage over its 3,000-year history
Collapse
Affiliation(s)
- Zhemin Zhou
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK.
| | - Inge Lundstrøm
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Alicia Tran-Dien
- Unité des Bactéries Pathogènes Entériques, Institut Pasteur, Paris, France
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Nabil-Fareed Alikhan
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK
| | - Martin J Sergeant
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK
| | | | - Anna K Fotakis
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | | | | | - Stian S Hamre
- Department of Archaeology, History, Cultural Studies and Religion, University of Bergen, Post Box 7805, 5020 Bergen, Norway
| | - Sherwood Casjens
- Pathology Department, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | | | - Christopher Quince
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK
| | | | | | - Simon Y W Ho
- School of Life and Environmental Sciences; University of Sydney, Sydney NSW 2006, Australia
| | - M Thomas P Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark; NTNU University Museum, N-7491 Trondheim, Norway.
| | - Mark Achtman
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK.
| |
Collapse
|
4
|
Hamada M, Ono Y, Asai K, Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics 2017; 33:926-928. [PMID: 28039163 PMCID: PMC5351549 DOI: 10.1093/bioinformatics/btw742] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 11/18/2016] [Indexed: 01/05/2023] Open
Abstract
Summary LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation the source code is freely available at http://last.cbrc.jp/. Contact mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | | | - Kiyoshi Asai
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan
| | - Martin C Frith
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan
| |
Collapse
|
5
|
Kerpedjiev P, Frellsen J, Lindgreen S, Krogh A. Adaptable probabilistic mapping of short reads using position specific scoring matrices. BMC Bioinformatics 2014; 15:100. [PMID: 24717095 PMCID: PMC4021105 DOI: 10.1186/1471-2105-15-100] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 03/28/2014] [Indexed: 11/10/2022] Open
Abstract
Background Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases. Results We show how evolution, data-specific biases, and sequencing errors are naturally dealt with probabilistically. Our method achieves better results than Bowtie and BWA on simulated and real ancient and PAR-CLIP reads, as well as on simulated reads from the AT rich organism P. falciparum, when modeling the biases of these data. For simulated Illumina reads, the method has consistently higher sensitivity for both single-end and paired-end data. We also show that our probabilistic approach can limit the problem of random matches from short reads of contamination and that it improves the mapping of real reads from one organism (D. melanogaster) to a related genome (D. simulans). Conclusion The presented work is an implementation of a novel approach to short read mapping where quality scores, prior mismatch probabilities and mapping qualities are handled in a statistically sound manner. The resulting implementation provides not only a tool for biologists working with low quality and/or biased sequencing data but also a demonstration of the feasibility of using a probability based alignment method on real and simulated data sets.
Collapse
Affiliation(s)
| | | | | | - Anders Krogh
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark.
| |
Collapse
|
6
|
Torri F, Dinov ID, Zamanyan A, Hobel S, Genco A, Petrosyan P, Clark AP, Liu Z, Eggert P, Pierce J, Knowles JA, Ames J, Kesselman C, Toga AW, Potkin SG, Vawter MP, Macciardi F. Next generation sequence analysis and computational genomics using graphical pipeline workflows. Genes (Basel) 2014; 3:545-75. [PMID: 23139896 PMCID: PMC3490498 DOI: 10.3390/genes3030545] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Whole-genome and exome sequencing have already proven to be essential and powerful methods to identify genes responsible for simple Mendelian inherited disorders. These methods can be applied to complex disorders as well, and have been adopted as one of the current mainstream approaches in population genetics. These achievements have been made possible by next generation sequencing (NGS) technologies, which require substantial bioinformatics resources to analyze the dense and complex sequence data. The huge analytical burden of data from genome sequencing might be seen as a bottleneck slowing the publication of NGS papers at this time, especially in psychiatric genetics. We review the existing methods for processing NGS data, to place into context the rationale for the design of a computational resource. We describe our method, the Graphical Pipeline for Computational Genomics (GPCG), to perform the computational steps required to analyze NGS data. The GPCG implements flexible workflows for basic sequence alignment, sequence data quality control, single nucleotide polymorphism analysis, copy number variant identification, annotation, and visualization of results. These workflows cover all the analytical steps required for NGS data, from processing the raw reads to variant calling and annotation. The current version of the pipeline is freely available at http://pipeline.loni.ucla.edu. These applications of NGS analysis may gain clinical utility in the near future (e.g., identifying miRNA signatures in diseases) when the bioinformatics approach is made feasible. Taken together, the annotation tools and strategies that have been developed to retrieve information and test hypotheses about the functional role of variants present in the human genome will help to pinpoint the genetic risk factors for psychiatric disorders.
Collapse
Affiliation(s)
- Federica Torri
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA 92617, USA; E-Mails: (F.T.); (S.G.P.)
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
| | - Ivo D. Dinov
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Alen Zamanyan
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Sam Hobel
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Alex Genco
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Petros Petrosyan
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Andrew P. Clark
- Zilkha Neurogenetic Institute, USC Keck School of Medicine, Los Angeles, CA 90033, USA; E-Mails: (A.P.C.); (J.A.K.)
| | - Zhizhong Liu
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Paul Eggert
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
- Department of Computer Science, University of California, Los Angeles, CA 90095, USA
| | - Jonathan Pierce
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - James A. Knowles
- Zilkha Neurogenetic Institute, USC Keck School of Medicine, Los Angeles, CA 90033, USA; E-Mails: (A.P.C.); (J.A.K.)
| | - Joseph Ames
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
| | - Carl Kesselman
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
| | - Arthur W. Toga
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
- Laboratory of Neuro Imaging (LONI), University of California, Los Angeles, CA 90095, USA; E-Mails: (A.Z.); (S.H.); (A.G.); (P.P.); (Z.L.); (P.E.); (J.P.)
| | - Steven G. Potkin
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA 92617, USA; E-Mails: (F.T.); (S.G.P.)
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
| | - Marquis P. Vawter
- Functional Genomics Laboratory, Department of Psychiatry And Human Behavior, School of Medicine, University of California, Irvine, CA 92697, USA; E-Mail:
| | - Fabio Macciardi
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA 92617, USA; E-Mails: (F.T.); (S.G.P.)
- Biomedical Informatics Research Network (BIRN), Information Sciences Institute, University of Southern California, Los Angeles, CA 90292, USA; E-Mails: (I.D.D.); (J.A.); (C.K.); (A.W.T.)
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +1-949-824-4559; Fax: +1-949-824-2072
| |
Collapse
|
7
|
Sandoval-Espinola WJ, Makwana ST, Chinn MS, Thon MR, Azcárate-Peril MA, Bruno-Bárcena JM. Comparative phenotypic analysis and genome sequence of Clostridium beijerinckii SA-1, an offspring of NCIMB 8052. MICROBIOLOGY (READING, ENGLAND) 2013; 159:2558-2570. [PMID: 24068240 PMCID: PMC7336276 DOI: 10.1099/mic.0.069534-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 09/24/2013] [Indexed: 01/07/2023]
Abstract
Production of butanol by solventogenic clostridia is controlled through metabolic regulation of the carbon flow and limited by its toxic effects. To overcome cell sensitivity to solvents, stress-directed evolution methodology was used three decades ago on Clostridium beijerinckii NCIMB 8052 that spawned the SA-1 strain. Here, we evaluated SA-1 solventogenic capabilities when growing on a previously validated medium containing, as carbon- and energy-limiting substrates, sucrose and the products of its hydrolysis d-glucose and d-fructose and only d-fructose. Comparative small-scale batch fermentations with controlled pH (pH 6.5) showed that SA-1 is a solvent hyper-producing strain capable of generating up to 16.1 g l(-1) of butanol and 26.3 g l(-1) of total solvents, 62.3 % and 63 % more than NCIMB 8052, respectively. This corresponds to butanol and solvent yields of 0.3 and 0.49 g g(-1), respectively (63 % and 65 % increase compared with NCIMB 8052). SA-1 showed a deficiency in d-fructose transport as suggested by its 7 h generation time compared with 1 h for NCIMB 8052. To potentially correlate physiological behaviour with genetic mutations, the whole genome of SA-1 was sequenced using the Illumina GA IIx platform. PCR and Sanger sequencing were performed to analyse the putative variations. As a result, four errors were confirmed and validated in the reference genome of NCIMB 8052 and a total of 10 genetic polymorphisms in SA-1. The genetic polymorphisms included eight single nucleotide variants, one small deletion and one large insertion that it is an additional copy of the insertion sequence ISCb1. Two of the genetic polymorphisms, the serine threonine phosphatase cbs_4400 and the solute binding protein cbs_0769, may possibly explain some of the observed physiological behaviour, such as rerouting of the metabolic carbon flow, deregulation of the d-fructose phosphotransferase transport system and delayed sporulation.
Collapse
Affiliation(s)
| | - Satya T. Makwana
- Department of Microbiology, North Carolina State University, Raleigh, NC 27695-7615, USA
| | - Mari S. Chinn
- Department of Biological and Agricultural Engineering, North Carolina State University, Raleigh, NC 27695-7615, USA
| | - Michael R. Thon
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Departamento de Microbiología y Genética, Universidad de Salamanca, Calle Del Duero 12, Villamayor 37185, Spain
| | - M. Andrea Azcárate-Peril
- Department of Cell Biology and Physiology, School of Medicine and Microbiome Core Facility, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, NC 27599-7545, USA
| | - José M. Bruno-Bárcena
- Department of Microbiology, North Carolina State University, Raleigh, NC 27695-7615, USA
| |
Collapse
|
8
|
Abel HJ, Duncavage EJ. Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. Cancer Genet 2013; 206:432-40. [PMID: 24405614 DOI: 10.1016/j.cancergen.2013.11.002] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Revised: 11/06/2013] [Accepted: 11/15/2013] [Indexed: 10/26/2022]
Abstract
Next generation sequencing (NGS), or massively paralleled sequencing, refers to a collective group of methods in which numerous sequencing reactions take place simultaneously, resulting in enormous amounts of sequencing data for a small fraction of the cost of Sanger sequencing. Typically short (50-250 bp), NGS reads are first mapped to a reference genome, and then variants are called from the mapped data. While most NGS applications focus on the detection of single nucleotide variants (SNVs) or small insertions/deletions (indels), structural variation, including translocations, larger indels, and copy number variation (CNV), can be identified from the same data. Structural variation detection can be performed from whole genome NGS data or "targeted" data including exomes or gene panels. However, while targeted sequencing greatly increases sequencing coverage or depth of particular genes, it may introduce biases in the data that require specialized informatic analyses. In the past several years, there have been considerable advances in methods used to detect structural variation, and a full range of variants from SNVs to balanced translocations to CNV can now be detected with reasonable sensitivity from either whole genome or targeted NGS data. Such methods are being rapidly applied to clinical testing where they can supplement or in some cases replace conventional fluorescence in situ hybridization or array-based testing. Here we review some of the informatics approaches used to detect structural variation from NGS data.
Collapse
Affiliation(s)
- Haley J Abel
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Eric J Duncavage
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
9
|
Abstract
Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.
Collapse
|
10
|
Mahmud MP, Wiedenhoeft J, Schliep A. Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees. Bioinformatics 2013; 28:i325-i332. [PMID: 22962448 PMCID: PMC3436807 DOI: 10.1093/bioinformatics/bts380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. Availability and implementation: TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. Contact:pavelm@cs.rutgers.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Pavel Mahmud
- Department of Computer Science, Rutgers University, New Jersey, USA.
| | | | | |
Collapse
|
11
|
Menzel P, Frellsen J, Plass M, Rasmussen SH, Krogh A. On the accuracy of short read mapping. Methods Mol Biol 2013; 1038:39-59. [PMID: 23872968 DOI: 10.1007/978-1-62703-514-9_3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The development of high-throughput sequencing technologies has revolutionized the way we study genomes and gene regulation. In a single experiment, millions of reads are produced. To gain knowledge from these experiments the first thing to be done is finding the genomic origin of the reads, i.e., mapping the reads to a reference genome. In this new situation, conventional alignment tools are obsolete, as they cannot handle this huge amount of data in a reasonable amount of time. Thus, new mapping algorithms have been developed, which are fast at the expense of a small decrease in accuracy. In this chapter we discuss the current problems in short read mapping and show that mapping reads correctly is a nontrivial task. Through simple experiments with both real and synthetic data, we demonstrate that different mappers can give different results depending on the type of data, and that a considerable fraction of uniquely mapped reads is potentially mapped to an incorrect location. Furthermore, we provide simple statistical results on the expected number of random matches in a genome (E-value) and the probability of a random match as a function of read length. Finally, we show that quality scores contain valuable information for mapping and why mapping quality should be evaluated in a probabilistic manner. In the end, we discuss the potential of improving the performance of current methods by considering these quality scores in a probabilistic mapping program.
Collapse
Affiliation(s)
- Peter Menzel
- Department of Biology, The Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | |
Collapse
|
12
|
Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model. BMC Bioinformatics 2012; 13:303. [PMID: 23151247 PMCID: PMC3534400 DOI: 10.1186/1471-2105-13-303] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 11/01/2012] [Indexed: 11/25/2022] Open
Abstract
Background 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
Collapse
|
13
|
Frith MC, Mori R, Asai K. A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res 2012; 40:e100. [PMID: 22457070 PMCID: PMC3401460 DOI: 10.1093/nar/gks275] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Cytosines in genomic DNA are sometimes methylated. This affects many biological processes and diseases. The standard way of measuring methylation is to use bisulfite, which converts unmethylated cytosines to thymines, then sequence the DNA and compare it to a reference genome sequence. We describe a method for the critical step of aligning the DNA reads to the correct genomic locations. Our method builds on classic alignment techniques, including likelihood-ratio scores and spaced seeds. In a realistic benchmark, our method has a better combination of sensitivity, specificity and speed than nine other high-throughput bisulfite aligners. This study enables more accurate and rational analysis of DNA methylation. It also illustrates how to adapt general-purpose alignment methods to a special case with distorted base patterns: this should be informative for other special cases such as ancient DNA and AT-rich genomes.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center, National Institute for Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|