1
|
Gibbons TR, Mount SM, Cooper ED, Delwiche CF. Erratum to: Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm. BMC Bioinformatics 2015; 16:274. [PMID: 26315999 PMCID: PMC4552414 DOI: 10.1186/s12859-015-0690-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Indexed: 11/25/2022] Open
Affiliation(s)
- Theodore R Gibbons
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland.
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland. .,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, 20742, Maryland.
| | - Endymion D Cooper
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland.
| | - Charles F Delwiche
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, 20742, Maryland. .,Maryland Agricultural Experiment Station, University of Maryland, College Park, 20742, Maryland.
| |
Collapse
|
2
|
Gibbons TR, Mount SM, Cooper ED, Delwiche CF. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm. BMC Bioinformatics 2015; 16:218. [PMID: 26160651 PMCID: PMC4496851 DOI: 10.1186/s12859-015-0625-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 05/20/2015] [Indexed: 11/10/2022] Open
Abstract
Background Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. Results All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. Conclusions The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0625-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Theodore R Gibbons
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland.
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland. .,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Baltimore, 20742, Maryland.
| | - Endymion D Cooper
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland.
| | - Charles F Delwiche
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Baltimore, 20742, Maryland. .,Maryland Agricultural Experiment Station, University of Maryland, College Park, Baltimore, 20742, Maryland.
| |
Collapse
|
3
|
Ju C, Van de Poel B, Cooper ED, Thierer JH, Gibbons TR, Delwiche CF, Chang C. Conservation of ethylene as a plant hormone over 450 million years of evolution. Nat Plants 2015. [PMID: 27246051 DOI: 10.1038/nplant.2014.4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Land plants evolved more than 450 million years ago from a lineage of freshwater charophyte green algae(1). The extent to which plant signalling systems existed before the evolutionary transition to land is unknown. Although charophytes occupy a key phylogenetic position for elucidating the origins of such signalling systems(2-4), there is a paucity of sequence data for these organisms(5,6). Here we carry out de novo transcriptomics of five representative charophyte species, and find putative homologues for the biosynthesis, transport, perception and signalling of major plant hormones. Focusing on the plant hormone ethylene, we provide evidence that the filamentous charophyte Spirogyra pratensis possesses an ethylene hormone system homologous to that in plants. Spirogyra produces ethylene and exhibits a cell elongation response to ethylene. Spirogyra ethylene-signalling homologues partially rescue mutants of the angiosperm Arabidopsis thaliana and respond post-translationally to ethylene when expressed in plant cells, indicative of unambiguously homologous ethylene-signalling pathways in Spirogyra and Arabidopsis. These findings imply that the common aquatic ancestor possessed this pathway prior to the colonization of land and that cell elongation was possibly an ancestral ethylene response. This highlights the importance of charophytes for investigating the origins of fundamental plant processes.
Collapse
Affiliation(s)
- Chuanli Ju
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Bram Van de Poel
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Endymion D Cooper
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - James H Thierer
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Theodore R Gibbons
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Charles F Delwiche
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Caren Chang
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
4
|
Ju C, Van de Poel B, Cooper ED, Thierer JH, Gibbons TR, Delwiche CF, Chang C. Conservation of ethylene as a plant hormone over 450 million years of evolution. Nat Plants 2015; 1:14004. [PMID: 27246051 DOI: 10.1038/nplants.2014.4] [Citation(s) in RCA: 139] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 10/16/2014] [Indexed: 05/20/2023]
Abstract
Land plants evolved more than 450 million years ago from a lineage of freshwater charophyte green algae(1). The extent to which plant signalling systems existed before the evolutionary transition to land is unknown. Although charophytes occupy a key phylogenetic position for elucidating the origins of such signalling systems(2-4), there is a paucity of sequence data for these organisms(5,6). Here we carry out de novo transcriptomics of five representative charophyte species, and find putative homologues for the biosynthesis, transport, perception and signalling of major plant hormones. Focusing on the plant hormone ethylene, we provide evidence that the filamentous charophyte Spirogyra pratensis possesses an ethylene hormone system homologous to that in plants. Spirogyra produces ethylene and exhibits a cell elongation response to ethylene. Spirogyra ethylene-signalling homologues partially rescue mutants of the angiosperm Arabidopsis thaliana and respond post-translationally to ethylene when expressed in plant cells, indicative of unambiguously homologous ethylene-signalling pathways in Spirogyra and Arabidopsis. These findings imply that the common aquatic ancestor possessed this pathway prior to the colonization of land and that cell elongation was possibly an ancestral ethylene response. This highlights the importance of charophytes for investigating the origins of fundamental plant processes.
Collapse
Affiliation(s)
- Chuanli Ju
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Bram Van de Poel
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Endymion D Cooper
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - James H Thierer
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Theodore R Gibbons
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Charles F Delwiche
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Caren Chang
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
5
|
Abstract
Metagenomic methods provide a powerful means to investigate complex ecological phenomena. Developed originally for study of Bacteria and Archaea, the application of these methods to eukaryotic microorganisms is yet to be fully realized. Most prior environmental molecular studies of eukaryotes have relied heavily on PCR amplification with eukaryote-specific primers. Here we apply high throughput short-read sequencing of poly-A selected RNA to capture the metatranscriptome of an estuarine dinoflagellate bloom. To validate the metatranscriptome assembly process we simulated metatranscriptomic datasets using short-read sequencing data from clonal cultures of four algae of varying phylogenetic distance. We find that the proportion of chimeric transcripts reconstructed from community transcriptome sequencing is low, suggesting that metatranscriptomic sequencing can be used to accurately reconstruct the transcripts expressed by bloom-forming communities of eukaryotes. To further validate the bloom metatransciptome assembly we compared it to a transcriptomic assembly from a cultured, clonal isolate of the dominant bloom-causing alga and found that the two assemblies are highly similar. Eukaryote-wide phylogenetic analyses reveal the taxonomic composition of the bloom community, which is comprised of several dinoflagellates, ciliates, animals, and fungi. The assembled metatranscriptome reveals the functional genomic composition of a metabolically active community. Highlighting the potential power of these methods, we found that relative transcript abundance patterns suggest that the dominant dinoflagellate might be expressing toxin biosynthesis related genes at a higher level in the presence of competitors, predators and prey compared to it growing in monoculture.
Collapse
Affiliation(s)
- Endymion D. Cooper
- CMNS-Cell Biology and Molecular Genetics, 2107 Bioscience Research Building, University of Maryland, College Park, MD 20742-4407, USA
| | - Bastian Bentlage
- CMNS-Cell Biology and Molecular Genetics, 2107 Bioscience Research Building, University of Maryland, College Park, MD 20742-4407, USA
| | - Theodore R. Gibbons
- CMNS-Cell Biology and Molecular Genetics, 2107 Bioscience Research Building, University of Maryland, College Park, MD 20742-4407, USA
| | - Tsvetan R. Bachvaroff
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, 701 E Pratt St., Baltimore, MD 21202, USA
| | - Charles F. Delwiche
- CMNS-Cell Biology and Molecular Genetics, 2107 Bioscience Research Building, University of Maryland, College Park, MD 20742-4407, USA
- Maryland Agricultural Experiment Station, AGNR, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
6
|
Liu B, Faller LL, Klitgord N, Mazumdar V, Ghodsi M, Sommer DD, Gibbons TR, Treangen TJ, Chang YC, Li S, Stine OC, Hasturk H, Kasif S, Segrè D, Pop M, Amar S. Deep sequencing of the oral microbiome reveals signatures of periodontal disease. PLoS One 2012; 7:e37919. [PMID: 22675498 PMCID: PMC3366996 DOI: 10.1371/journal.pone.0037919] [Citation(s) in RCA: 263] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Accepted: 04/30/2012] [Indexed: 11/18/2022] Open
Abstract
The oral microbiome, the complex ecosystem of microbes inhabiting the human mouth, harbors several thousands of bacterial types. The proliferation of pathogenic bacteria within the mouth gives rise to periodontitis, an inflammatory disease known to also constitute a risk factor for cardiovascular disease. While much is known about individual species associated with pathogenesis, the system-level mechanisms underlying the transition from health to disease are still poorly understood. Through the sequencing of the 16S rRNA gene and of whole community DNA we provide a glimpse at the global genetic, metabolic, and ecological changes associated with periodontitis in 15 subgingival plaque samples, four from each of two periodontitis patients, and the remaining samples from three healthy individuals. We also demonstrate the power of whole-metagenome sequencing approaches in characterizing the genomes of key players in the oral microbiome, including an unculturable TM7 organism. We reveal the disease microbiome to be enriched in virulence factors, and adapted to a parasitic lifestyle that takes advantage of the disrupted host homeostasis. Furthermore, diseased samples share a common structure that was not found in completely healthy samples, suggesting that the disease state may occupy a narrow region within the space of possible configurations of the oral microbiome. Our pilot study demonstrates the power of high-throughput sequencing as a tool for understanding the role of the oral microbiome in periodontal disease. Despite a modest level of sequencing (~2 lanes Illumina 76 bp PE) and high human DNA contamination (up to ~90%) we were able to partially reconstruct several oral microbes and to preliminarily characterize some systems-level differences between the healthy and diseased oral microbiomes.
Collapse
Affiliation(s)
- Bo Liu
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Lina L. Faller
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Niels Klitgord
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Varun Mazumdar
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Mohammad Ghodsi
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
| | - Daniel D. Sommer
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Theodore R. Gibbons
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- Biological Sciences Graduate Program, University of Maryland, College Park, Maryland, United States of America
| | - Todd J. Treangen
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- The McKusick-Nathans Institute for Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Yi-Chien Chang
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Shan Li
- Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - O. Colin Stine
- Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Hatice Hasturk
- The Forysth Institute, Department of Periodontology, Cambridge, Massachusetts, United States of America
| | - Simon Kasif
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Children’s Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Boston, Massachusetts, United States of America
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Department of Biology, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
- Biological Sciences Graduate Program, University of Maryland, College Park, Maryland, United States of America
| | - Salomon Amar
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
- Center for Anti-Inflammatory Therapeutics; Boston University Goldman School of Dental Medicine, Boston, Massachusetts, United States of America
| |
Collapse
|
7
|
Gibbons TR, Concepcion GT, Bachvaroff TR, Delwiche CF. Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis. Genome Biol 2011. [PMCID: PMC3439037 DOI: 10.1186/gb-2011-12-s1-p5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|