1
|
Gaschignard G, Millet M, Bruley A, Benzerara K, Dezi M, Skouri-Panet F, Duprat E, Callebaut I. AlphaFold2-guided description of CoBaHMA, a novel family of bacterial domains within the heavy-metal-associated superfamily. Proteins 2024; 92:776-794. [PMID: 38258321 DOI: 10.1002/prot.26668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/22/2023] [Accepted: 01/01/2024] [Indexed: 01/24/2024]
Abstract
Three-dimensional (3D) structure information, now available at the proteome scale, may facilitate the detection of remote evolutionary relationships in protein superfamilies. Here, we illustrate this with the identification of a novel family of protein domains related to the ferredoxin-like superfold, by combining (i) transitive sequence similarity searches, (ii) clustering approaches, and (iii) the use of AlphaFold2 3D structure models. Domains of this family were initially identified in relation with the intracellular biomineralization of calcium carbonates by Cyanobacteria. They are part of the large heavy-metal-associated (HMA) superfamily, departing from the latter by specific sequence and structural features. In particular, most of them share conserved basic amino acids (hence their name CoBaHMA for Conserved Basic residues HMA), forming a positively charged surface, which is likely to interact with anionic partners. CoBaHMA domains are found in diverse modular organizations in bacteria, existing in the form of monodomain proteins or as part of larger proteins, some of which are membrane proteins involved in transport or lipid metabolism. This suggests that the CoBaHMA domains may exert a regulatory function, involving interactions with anionic lipids. This hypothesis might have a particular resonance in the context of the compartmentalization observed for cyanobacterial intracellular calcium carbonates.
Collapse
Affiliation(s)
- Geoffroy Gaschignard
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Maxime Millet
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Karim Benzerara
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Manuela Dezi
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Feriel Skouri-Panet
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
2
|
Beller ZW, Wesener DA, Seebeck TR, Guruge JL, Byrne AE, Henrissat S, Terrapon N, Henrissat B, Rodionov DA, Osterman AL, Suarez C, Bacalzo NP, Chen Y, Couture G, Lebrilla CB, Zhang Z, Eastlund ER, McCann CH, Davis GD, Gordon JI. Inducible CRISPR-targeted "knockdown" of human gut Bacteroides in gnotobiotic mice discloses glycan utilization strategies. Proc Natl Acad Sci U S A 2023; 120:e2311422120. [PMID: 37733741 PMCID: PMC10523453 DOI: 10.1073/pnas.2311422120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 08/08/2023] [Indexed: 09/23/2023] Open
Abstract
Understanding how members of the human gut microbiota prioritize nutrient resources is one component of a larger effort to decipher the mechanisms defining microbial community robustness and resiliency in health and disease. This knowledge is foundational for development of microbiota-directed therapeutics. To model how bacteria prioritize glycans in the gut, germfree mice were colonized with 13 human gut bacterial strains, including seven saccharolytic Bacteroidaceae species. Animals were fed a Western diet supplemented with pea fiber. After community assembly, an inducible CRISPR-based system was used to selectively and temporarily reduce the absolute abundance of Bacteroides thetaiotaomicron or B. cellulosilyticus by 10- to 60-fold. Each knockdown resulted in specific, reproducible increases in the abundances of other Bacteroidaceae and dynamic alterations in their expression of genes involved in glycan utilization. Emergence of these "alternate consumers" was associated with preservation of community saccharolytic activity. Using an inducible system for CRISPR base editing in vitro, we disrupted translation of transporters critical for utilizing dietary polysaccharides in Phocaeicola vulgatus, a B. cellulosilyticus knockdown-responsive taxon. In vitro and in vivo tests of the resulting P. vulgatus mutants allowed us to further characterize mechanisms associated with its increased fitness after knockdown. In principle, the approach described can be applied to study utilization of a range of nutrients and to preclinical efforts designed to develop therapeutic strategies for precision manipulation of microbial communities.
Collapse
Affiliation(s)
- Zachary W. Beller
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
| | - Darryl A. Wesener
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
| | - Timothy R. Seebeck
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
- Genome Engineering R&D, MilliporeSigma, the Life Science business Merck KGaA, Darmstadt, Germany, St. Louis, MO63103
| | - Janaki L. Guruge
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
| | - Alexandra E. Byrne
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
| | - Suzanne Henrissat
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
- Architecture et Fonction des Macromolécules Biologiques, Centre National de la Recherche Scientifique and Aix-Marseille University, 13288Marseille, France
| | - Nicolas Terrapon
- Architecture et Fonction des Macromolécules Biologiques, Centre National de la Recherche Scientifique and Aix-Marseille University, 13288Marseille, France
| | - Bernard Henrissat
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. LyngbyDK-2800, Denmark
- Department of Biological Sciences, King Abdulaziz University, Jeddah21589, Saudi Arabia
| | - Dmitry A. Rodionov
- Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA92037
| | - Andrei L. Osterman
- Infectious and Inflammatory Disease Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA92037
| | - Chris Suarez
- Department of Chemistry, University of California, Davis, CA95616
| | | | - Ye Chen
- Department of Chemistry, University of California, Davis, CA95616
| | - Garret Couture
- Department of Chemistry, University of California, Davis, CA95616
| | | | - Zhigang Zhang
- Genome Engineering R&D, MilliporeSigma, the Life Science business Merck KGaA, Darmstadt, Germany, St. Louis, MO63103
| | - Erik R. Eastlund
- Genome Engineering R&D, MilliporeSigma, the Life Science business Merck KGaA, Darmstadt, Germany, St. Louis, MO63103
| | - Caitlin H. McCann
- Genome Engineering R&D, MilliporeSigma, the Life Science business Merck KGaA, Darmstadt, Germany, St. Louis, MO63103
| | - Gregory D. Davis
- Genome Engineering R&D, MilliporeSigma, the Life Science business Merck KGaA, Darmstadt, Germany, St. Louis, MO63103
| | - Jeffrey I. Gordon
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO63110
- Center for Gut Microbiome and Nutrition Research, Washington University School of Medicine, St. Louis, MO63110
| |
Collapse
|
3
|
Anjum N, Nabil RL, Rafi RI, Bayzid MS, Rahman MS. CD-MAWS: An Alignment-Free Phylogeny Estimation Method Using Cosine Distance on Minimal Absent Word Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:196-205. [PMID: 34928803 DOI: 10.1109/tcbb.2021.3136792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Multiple sequence alignment has been the traditional and well established approach of sequence analysis and comparison, though it is time and memory consuming. As the scale of sequencing data is increasing day by day, the importance of faster yet accurate alignment-free methods is on the rise. Several alignment-free sequence analysis methods have been established in the literature in recent years, which extract numerical features from genomic data to analyze sequences and also to estimate phylogenetic relationship among genes and species. Minimal Absent Word (MAW) is an effective concept for representing characteristics of a sequence in an alignment-free manner. In this study, we present CD-MAWS, a distance measure based on cosine of the angle between composition vectors constructed using minimal absent words, for sequence analysis in a computationally inexpensive manner. We have benchmarked CD-MAWS using several AFProject datasets, such as Fish mtDNA, E.coli, Plants, Shigella and Yersinia datasets, and found it to perform quite well. Applied on several other biological datasets such as mammal mtDNA, bacterial genomes and viral genomes, CD-MAWS resolved phylogenetic relationships similar to or better than state-of-the-art alignment-free methods such as Mash, Skmer, Co-phylog and kSNP3.
Collapse
|
4
|
Terrapon N, Lombard V, Drula É, Lapébie P, Al-Masaudi S, Gilbert HJ, Henrissat B. PULDB: the expanded database of Polysaccharide Utilization Loci. Nucleic Acids Res 2019; 46:D677-D683. [PMID: 29088389 PMCID: PMC5753385 DOI: 10.1093/nar/gkx1022] [Citation(s) in RCA: 187] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2017] [Accepted: 10/25/2017] [Indexed: 12/12/2022] Open
Abstract
The Polysaccharide Utilization Loci (PUL) database was launched in 2015 to present PUL predictions in ∼70 Bacteroidetes species isolated from the human gastrointestinal tract, as well as PULs derived from the experimental data reported in the literature. In 2018 PULDB offers access to 820 genomes, sampled from various environments and covering a much wider taxonomical range. A Krona dynamic chart was set up to facilitate browsing through taxonomy. Literature surveys now allows the presentation of the most recent (i) PUL repertoires deduced from RNAseq large-scale experiments, (ii) PULs that have been subjected to in-depth biochemical analysis and (iii) new Carbohydrate-Active enzyme (CAZyme) families that contributed to the refinement of PUL predictions. To improve PUL visualization and genome browsing, the previous annotation of genes encoding CAZymes, regulators, integrases and SusCD has now been expanded to include functionally relevant protein families whose genes are significantly found in the vicinity of PULs: sulfatases, proteases, ROK repressors, epimerases and ATP-Binding Cassette and Major Facilitator Superfamily transporters. To cope with cases where susCD may be absent due to incomplete assemblies/split PULs, we present ‘CAZyme cluster’ predictions. Finally, a PUL alignment tool, operating on the tagged families instead of amino-acid sequences, was integrated to retrieve PULs similar to a query of interest. The updated PULDB website is accessible at www.cazy.org/PULDB_new/
Collapse
Affiliation(s)
- Nicolas Terrapon
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille Université, F-13288 Marseille, France.,USC1408 Architecture et Fonction des Macromolécules Biologiques, Institut National de la Recherche Agronomique, F-13288 Marseille, France
| | - Vincent Lombard
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille Université, F-13288 Marseille, France.,USC1408 Architecture et Fonction des Macromolécules Biologiques, Institut National de la Recherche Agronomique, F-13288 Marseille, France
| | - Élodie Drula
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille Université, F-13288 Marseille, France.,USC1408 Architecture et Fonction des Macromolécules Biologiques, Institut National de la Recherche Agronomique, F-13288 Marseille, France
| | - Pascal Lapébie
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille Université, F-13288 Marseille, France.,USC1408 Architecture et Fonction des Macromolécules Biologiques, Institut National de la Recherche Agronomique, F-13288 Marseille, France
| | - Saad Al-Masaudi
- Department of Biological Sciences, King Abdulaziz University, 23218 Jeddah, Saudi Arabia
| | - Harry J Gilbert
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Bernard Henrissat
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille Université, F-13288 Marseille, France.,USC1408 Architecture et Fonction des Macromolécules Biologiques, Institut National de la Recherche Agronomique, F-13288 Marseille, France.,Department of Biological Sciences, King Abdulaziz University, 23218 Jeddah, Saudi Arabia
| |
Collapse
|
5
|
Zielezinski A, Girgis HZ, Bernard G, Leimeister CA, Tang K, Dencker T, Lau AK, Röhling S, Choi JJ, Waterman MS, Comin M, Kim SH, Vinga S, Almeida JS, Chan CX, James BT, Sun F, Morgenstern B, Karlowski WM. Benchmarking of alignment-free sequence comparison methods. Genome Biol 2019; 20:144. [PMID: 31345254 PMCID: PMC6659240 DOI: 10.1186/s13059-019-1755-7] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 07/03/2019] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznańskiego 6, 61-614, Poznan, Poland
| | - Hani Z Girgis
- Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK, 74104, USA
| | | | - Chris-Andre Leimeister
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Kujin Tang
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Anna Katharina Lau
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Sophie Röhling
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Jae Jin Choi
- Department of Chemistry, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Michael S Waterman
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Sung-Hou Kim
- Department of Chemistry, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
| | - Jonas S Almeida
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NIH/NCI), Bethesda, USA
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Benjamin T James
- Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK, 74104, USA
| | - Fengzhu Sun
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznańskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
6
|
Carroll HD, Spouge JL, Gonzalez M. MultiDomainBenchmark: a multi-domain query and subject database suite. BMC Bioinformatics 2019; 20:77. [PMID: 30764761 PMCID: PMC6376684 DOI: 10.1186/s12859-019-2660-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 01/28/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. DESCRIPTION This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. CONCLUSION MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/ .
Collapse
Affiliation(s)
- Hyrum D. Carroll
- TSYS School of Computer Science, Columbus State University, 4225 University Avenue, Columbus, 31907 GA USA
| | - John L. Spouge
- National Center for Biotechnology Information, Bethesda, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894 MD USA
| | - Mileidy Gonzalez
- National Center for Biotechnology Information, Bethesda, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894 MD USA
| |
Collapse
|
7
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
8
|
Cartmell A, Muñoz-Muñoz J, Briggs JA, Ndeh DA, Lowe EC, Baslé A, Terrapon N, Stott K, Heunis T, Gray J, Yu L, Dupree P, Fernandes PZ, Shah S, Williams SJ, Labourel A, Trost M, Henrissat B, Gilbert HJ. A surface endogalactanase in Bacteroides thetaiotaomicron confers keystone status for arabinogalactan degradation. Nat Microbiol 2018; 3:1314-1326. [PMID: 30349080 PMCID: PMC6217937 DOI: 10.1038/s41564-018-0258-8] [Citation(s) in RCA: 100] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 08/30/2018] [Indexed: 12/24/2022]
Abstract
Glycans are major nutrients for the human gut microbiota (HGM). Arabinogalactan proteins (AGPs) comprise a heterogenous group of plant glycans in which a β1,3-galactan backbone and β1,6-galactan side chains are conserved. Diversity is provided by the variable nature of the sugars that decorate the galactans. The mechanisms by which nutritionally relevant AGPs are degraded in the HGM are poorly understood. Here we explore how the HGM organism Bacteroides thetaiotaomicron metabolizes AGPs. We propose a sequential degradative model in which exo-acting glycoside hydrolase (GH) family 43 β1,3-galactanases release the side chains. These oligosaccharide side chains are depolymerized by the synergistic action of exo-acting enzymes in which catalytic interactions are dependent on whether degradation is initiated by a lyase or GH. We identified two GHs that establish two previously undiscovered GH families. The crystal structures of the exo-β1,3-galactanases identified a key specificity determinant and departure from the canonical catalytic apparatus of GH43 enzymes. Growth studies of Bacteroidetes spp. on complex AGP revealed 3 keystone organisms that facilitated utilization of the glycan by 17 recipient bacteria, which included B. thetaiotaomicron. A surface endo-β1,3-galactanase, when engineered into B. thetaiotaomicron, enabled the bacterium to utilize complex AGPs and act as a keystone organism.
Collapse
Affiliation(s)
- Alan Cartmell
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Jose Muñoz-Muñoz
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne, UK
| | - Jonathon A Briggs
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Didier A Ndeh
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Elisabeth C Lowe
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Arnaud Baslé
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Nicolas Terrapon
- Architecture et Fonction des Macromolécules Biologiques, Centre National de la Recherche Scientifique (CNRS), Aix-Marseille University, Marseille, France
- INRA, USC 1408 AFMB, Marseille, France
| | - Katherine Stott
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Tiaan Heunis
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Joe Gray
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Li Yu
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Paul Dupree
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Pearl Z Fernandes
- School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Sayali Shah
- School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Spencer J Williams
- School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Victoria, Australia
| | - Aurore Labourel
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Matthias Trost
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK
| | - Bernard Henrissat
- Architecture et Fonction des Macromolécules Biologiques, Centre National de la Recherche Scientifique (CNRS), Aix-Marseille University, Marseille, France
- INRA, USC 1408 AFMB, Marseille, France
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Harry J Gilbert
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne, UK.
| |
Collapse
|
9
|
de Souza MM, Zerlotini A, Geistlinger L, Tizioto PC, Taylor JF, Rocha MIP, Diniz WJS, Coutinho LL, Regitano LCA. A comprehensive manually-curated compendium of bovine transcription factors. Sci Rep 2018; 8:13747. [PMID: 30213987 PMCID: PMC6137171 DOI: 10.1038/s41598-018-32146-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 08/29/2018] [Indexed: 01/28/2023] Open
Abstract
Transcription factors (TFs) are pivotal regulatory proteins that control gene expression in a context-dependent and tissue-specific manner. In contrast to human, where comprehensive curated TF collections exist, bovine TFs are only rudimentary recorded and characterized. In this article, we present a manually-curated compendium of 865 sequence-specific DNA-binding bovines TFs, which we analyzed for domain family distribution, evolutionary conservation, and tissue-specific expression. In addition, we provide a list of putative transcription cofactors derived from known interactions with the identified TFs. Since there is a general lack of knowledge concerning the regulation of gene expression in cattle, the curated list of TF should provide a basis for an improved comprehension of regulatory mechanisms that are specific to the species.
Collapse
Affiliation(s)
- Marcela M de Souza
- Post-graduation Program of Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, São Paulo, 13560-970, Brazil.,Animal Biotechnology, Embrapa Pecuária Sudeste, São Carlos, São Paulo, 13560-970, Brazil
| | - Adhemar Zerlotini
- Bioinformatic Multi-user Laboratory, Embrapa Informática Agropecuária, Campinas, São Paulo, 70770-901, Brazil
| | - Ludwig Geistlinger
- Animal Biotechnology, Embrapa Pecuária Sudeste, São Carlos, São Paulo, 13560-970, Brazil
| | | | - Jeremy F Taylor
- Division of Animal Science, University of Missouri, Columbia, Missouri, 65211-5300, USA
| | - Marina I P Rocha
- Post-graduation Program of Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, São Paulo, 13560-970, Brazil
| | - Wellison J S Diniz
- Post-graduation Program of Evolutionary Genetics and Molecular Biology, Federal University of São Carlos, São Carlos, São Paulo, 13560-970, Brazil
| | - Luiz L Coutinho
- Functional Genomic Center, University of São Paulo, Piracicaba, São Paulo, 13418-900, Brazil
| | - Luciana C A Regitano
- Animal Biotechnology, Embrapa Pecuária Sudeste, São Carlos, São Paulo, 13560-970, Brazil.
| |
Collapse
|
10
|
Dietary pectic glycans are degraded by coordinated enzyme pathways in human colonic Bacteroides. Nat Microbiol 2017; 3:210-219. [PMID: 29255254 PMCID: PMC5784806 DOI: 10.1038/s41564-017-0079-1] [Citation(s) in RCA: 270] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 11/20/2017] [Indexed: 12/15/2022]
Abstract
The major nutrients available to human colonic
Bacteroides species are glycans exemplified by pectins, a
network of covalently linked plant cell wall polysaccharides containing
galacturonic acid (GalA). Metabolism of complex carbohydrates by the
Bacteroides genus is orchestrated by polysaccharide
utilisation loci or PULs. In Bacteroides thetaiotaomicron, a
human colonic bacterium, the PULs activated by the different pectin domains have
been identified, however, the mechanism by which these loci contribute to the
degradation of these GalA-containing polysaccharides is poorly understood. Here
we show that each PUL orchestrates the metabolism of specific pectin molecules,
recruiting enzymes from two previously unknown glycoside hydrolase (GH)
families. The apparatus that depolymerizes the backbone of rhamnogalacturonan-I
(RGI) is particularly complex. This system contains several GHs that trim the
remnants of other pectin domains attached to RGI, while nine enzymes contribute
to the degradation of the backbone comprising a rhamnose-GalA repeating unit.
The catalytic properties of the pectin degrading enzymes are optimized to
protect the glycan cues that activate the specific PULs ensuring a continuous
supply of inducing molecules throughout growth. The contribution of
Bacteroides spp. to the metabolism of the pectic network is
illustrated by cross-feeding between organisms.
Collapse
|
11
|
Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017; 18:186. [PMID: 28974235 PMCID: PMC5627421 DOI: 10.1186/s13059-017-1319-7] [Citation(s) in RCA: 277] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614, Poznan, Poland
| | - Susana Vinga
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
| | - Jonas Almeida
- Stony Brook University (SUNY), 101 Nicolls Road, Stony Brook, NY, 11794, USA
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614, Poznan, Poland.
| |
Collapse
|
12
|
Cartmell A, Lowe EC, Baslé A, Firbank SJ, Ndeh DA, Murray H, Terrapon N, Lombard V, Henrissat B, Turnbull JE, Czjzek M, Gilbert HJ, Bolam DN. How members of the human gut microbiota overcome the sulfation problem posed by glycosaminoglycans. Proc Natl Acad Sci U S A 2017; 114:7037-7042. [PMID: 28630303 PMCID: PMC5502631 DOI: 10.1073/pnas.1704367114] [Citation(s) in RCA: 101] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The human microbiota, which plays an important role in health and disease, uses complex carbohydrates as a major source of nutrients. Utilization hierarchy indicates that the host glycosaminoglycans heparin (Hep) and heparan sulfate (HS) are high-priority carbohydrates for Bacteroides thetaiotaomicron, a prominent member of the human microbiota. The sulfation patterns of these glycosaminoglycans are highly variable, which presents a significant enzymatic challenge to the polysaccharide lyases and sulfatases that mediate degradation. It is possible that the bacterium recruits lyases with highly plastic specificities and expresses a repertoire of enzymes that target substructures of the glycosaminoglycans with variable sulfation or that the glycans are desulfated before cleavage by the lyases. To distinguish between these mechanisms, the components of the B. thetaiotaomicron Hep/HS degrading apparatus were analyzed. The data showed that the bacterium expressed a single-surface endo-acting lyase that cleaved HS, reflecting its higher molecular weight compared with Hep. Both Hep and HS oligosaccharides imported into the periplasm were degraded by a repertoire of lyases, with each enzyme displaying specificity for substructures within these glycosaminoglycans that display a different degree of sulfation. Furthermore, the crystal structures of a key surface glycan binding protein, which is able to bind both Hep and HS, and periplasmic sulfatases reveal the major specificity determinants for these proteins. The locus described here is highly conserved within the human gut Bacteroides, indicating that the model developed is of generic relevance to this important microbial community.
Collapse
Affiliation(s)
- Alan Cartmell
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Elisabeth C Lowe
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Arnaud Baslé
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Susan J Firbank
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Didier A Ndeh
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Heath Murray
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - Nicolas Terrapon
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille University, F-13288 Marseille, France
| | - Vincent Lombard
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille University, F-13288 Marseille, France
| | - Bernard Henrissat
- Architecture et Fonction des Macromolécules Biologiques, CNRS, Aix-Marseille University, F-13288 Marseille, France
- Institut National de la Recherche Agronomique, USC1408 Architecture et Fonction des Macromolécules Biologiques, F-13288 Marseille, France
- Department of Biological Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Jeremy E Turnbull
- Centre for Glycobiology, Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Mirjam Czjzek
- Sorbonne Universités, Université Pierre-et-Marie-Curie, Université Paris 06, F-29688 Roscoff cedex, Bretagne, France
- CNRS, UMR 8227, Integrative Biology of Marine Models, Station Biologique de Roscoff, F-29688 Roscoff cedex, Bretagne, France
| | - Harry J Gilbert
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom
| | - David N Bolam
- Institute for Cell and Molecular Biosciences, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom;
| |
Collapse
|
13
|
Ndeh D, Rogowski A, Cartmell A, Luis AS, Baslé A, Gray J, Venditto I, Briggs J, Zhang X, Labourel A, Terrapon N, Buffetto F, Nepogodiev S, Xiao Y, Field RA, Zhu Y, O’Neil MA, Urbanowicz BR, York WS, Davies GJ, Abbott DW, Ralet MC, Martens EC, Henrissat B, Gilbert HJ. Complex pectin metabolism by gut bacteria reveals novel catalytic functions. Nature 2017; 544:65-70. [PMID: 28329766 PMCID: PMC5388186 DOI: 10.1038/nature21725] [Citation(s) in RCA: 426] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 02/27/2017] [Indexed: 12/30/2022]
Abstract
The metabolism of carbohydrate polymers drives microbial diversity in the human gut microbiota. It is unclear, however, whether bacterial consortia or single organisms are required to depolymerize highly complex glycans. Here we show that the gut bacterium Bacteroides thetaiotaomicron uses the most structurally complex glycan known: the plant pectic polysaccharide rhamnogalacturonan-II, cleaving all but 1 of its 21 distinct glycosidic linkages. The deconstruction of rhamnogalacturonan-II side chains and backbone are coordinated to overcome steric constraints, and the degradation involves previously undiscovered enzyme families and catalytic activities. The degradation system informs revision of the current structural model of rhamnogalacturonan-II and highlights how individual gut bacteria orchestrate manifold enzymes to metabolize the most challenging glycan in the human diet.
Collapse
Affiliation(s)
- Didier Ndeh
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Artur Rogowski
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Alan Cartmell
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Ana S. Luis
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Arnaud Baslé
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Joseph Gray
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Immacolata Venditto
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Jonathon Briggs
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Xiaoyang Zhang
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Aurore Labourel
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| | - Nicolas Terrapon
- Architecture et Fonction des Macromolécules Biologiques,
Centre National de la Recherche Scientifique (CNRS), Aix-Marseille University,
F-13288 Marseille, France
| | - Fanny Buffetto
- INRA, UR1268 Biopolymères Interactions Assemblages, 44300
Nantes, France
| | - Sergey Nepogodiev
- Department of Biological Chemistry, John Innes Centre Norwich
Research Park, Norwich NR4 7UH, UK
| | - Yao Xiao
- Department of Microbiology and Immunology, University of Michigan
Medical School, Ann Arbor, MI, USA
| | - Robert A. Field
- Department of Biological Chemistry, John Innes Centre Norwich
Research Park, Norwich NR4 7UH, UK
| | - Yanping Zhu
- Complex Carbohydrate Research Center, The University of Georgia, 315
Riverbend Road, Athens, GA 30602, USA
| | - Malcolm A. O’Neil
- Complex Carbohydrate Research Center, The University of Georgia, 315
Riverbend Road, Athens, GA 30602, USA
| | - Breeana R. Urbanowicz
- Complex Carbohydrate Research Center, The University of Georgia, 315
Riverbend Road, Athens, GA 30602, USA
| | - William S. York
- Complex Carbohydrate Research Center, The University of Georgia, 315
Riverbend Road, Athens, GA 30602, USA
| | | | | | | | - Eric C. Martens
- Department of Microbiology and Immunology, University of Michigan
Medical School, Ann Arbor, MI, USA
| | - Bernard Henrissat
- Architecture et Fonction des Macromolécules Biologiques,
Centre National de la Recherche Scientifique (CNRS), Aix-Marseille University,
F-13288 Marseille, France
- INRA, USC 1408 AFMB, F-13288 Marseille, France
- Department of Biological Sciences, King Abdulaziz University,
Jeddah, Saudi Arabia
| | - Harry J. Gilbert
- Institute for Cell and Molecular Biosciences, Newcastle University,
Newcastle upon Tyne NE2 4HH, U.K
| |
Collapse
|
14
|
Weiner J, Kooij TWA. Phylogenetic profiles of all membrane transport proteins of the malaria parasite highlight new drug targets. MICROBIAL CELL 2016; 3:511-521. [PMID: 28357319 PMCID: PMC5348985 DOI: 10.15698/mic2016.10.534] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
In order to combat the on-going malaria epidemic, discovery of new drug targets
remains vital. Proteins that are essential to survival and specific to malaria
parasites are key candidates. To survive within host cells, the parasites need
to acquire nutrients and dispose of waste products across multiple membranes.
Additionally, like all eukaryotes, they must redistribute ions and organic
molecules between their various internal membrane bound compartments. Membrane
transport proteins mediate all of these processes and are considered important
mediators of drug resistance as well as drug targets in their own right.
Recently, using advanced experimental genetic approaches and streamlined life
cycle profiling, we generated a large collection of Plasmodium
berghei gene deletion mutants and assigned essential gene
functions, highlighting potential targets for prophylactic, therapeutic, and
transmission-blocking anti-malarial drugs. Here, we present a comprehensive
orthology assignment of all Plasmodium falciparum putative
membrane transport proteins and provide a detailed overview of the associated
essential gene functions obtained through experimental genetics studies in human
and murine model parasites. Furthermore, we discuss the phylogeny of selected
potential drug targets identified in our functional screen. We extensively
discuss the results in the context of the functional assignments obtained using
gene targeting available to date.
Collapse
Affiliation(s)
- January Weiner
- Department of Immunology, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Taco W A Kooij
- Department of Medical Microbiology & Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
| |
Collapse
|
15
|
Abstract
Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10%-30% of all genes without any detectable similarity to existing proteins. Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding 'dark genomic matter'. These 'de novo' proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel and physiologically important functions. Here we review mechanisms by which 'de novo' proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a 'grow slow and moult' model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.
Collapse
|
16
|
Doğan T, MacDougall A, Saidi R, Poggioli D, Bateman A, O'Donovan C, Martin MJ. UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB. Bioinformatics 2016; 32:2264-71. [PMID: 27153729 PMCID: PMC4965628 DOI: 10.1093/bioinformatics/btw114] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2015] [Revised: 01/22/2016] [Accepted: 02/25/2016] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. RESULTS We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. AVAILABILITY AND IMPLEMENTATION The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ CONTACT: tdogan@ebi.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tunca Doğan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Alistair MacDougall
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Diego Poggioli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
17
|
Abstract
Interactions among biological entities contain more information than purely the similarities between the entities. For example, interactions between genes, and gene products, can be more informative than the sequence similarities of the genes involved. However, the study of biological networks and their evolution in particular is still in its infancy. Simplified theoretical models of the development of biological networks from a starting state exist, but the problem of finding a distance between existing biological networks, with an unknown history, has seen less research. Metrics for network distance can also be used to measure the fit between theoretically derived networks and their real-world counterpart. In this article, we present a useful model of biological network distance and demonstrate an implementation using simulated gene regulatory networks. We compared our method with existing methods for network alignment and showed that we are much better able to identify evolutionary changes in biological networks. In particular, we can recover the evolutionary trees that describe the relationship between these networks.
Collapse
Affiliation(s)
- Martin McGrane
- 1 School of Information Technologies, The University of Sydney , Sydney, New South Wales, Australia
| | | |
Collapse
|
18
|
Dohmen E, Kremer LPM, Bornberg-Bauer E, Kemena C. DOGMA: domain-based transcriptome and proteome quality assessment. Bioinformatics 2016; 32:2577-81. [PMID: 27153665 DOI: 10.1093/bioinformatics/btw231] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/21/2016] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Genome studies have become cheaper and easier than ever before, due to the decreased costs of high-throughput sequencing and the free availability of analysis software. However, the quality of genome or transcriptome assemblies can vary a lot. Therefore, quality assessment of assemblies and annotations are crucial aspects of genome analysis pipelines. RESULTS We developed DOGMA, a program for fast and easy quality assessment of transcriptome and proteome data based on conserved protein domains. DOGMA measures the completeness of a given transcriptome or proteome and provides information about domain content for further analysis. DOGMA provides a very fast way to do quality assessment within seconds. AVAILABILITY AND IMPLEMENTATION DOGMA is implemented in Python and published under GNU GPL v.3 license. The source code is available on https://ebbgit.uni-muenster.de/domainWorld/DOGMA/ CONTACTS: e.dohmen@wwu.de or c.kemena@wwu.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster 48149, Germany Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, Recklinghausen 45665, Germany
| | - Lukas P M Kremer
- Institute for Evolution and Biodiversity, University of Münster, Münster 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster 48149, Germany
| | - Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Münster 48149, Germany
| |
Collapse
|
19
|
Scaiewicz A, Levitt M. The language of the protein universe. Curr Opin Genet Dev 2015; 35:50-6. [PMID: 26451980 DOI: 10.1016/j.gde.2015.08.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 08/20/2015] [Accepted: 08/25/2015] [Indexed: 11/17/2022]
Abstract
Proteins, the main cell machinery which play a major role in nearly every cellular process, have always been a central focus in biology. We live in the post-genomic era, and inferring information from massive data sets is a steadily growing universal challenge. The increasing availability of fully sequenced genomes can be regarded as the 'Rosetta Stone' of the protein universe, allowing the understanding of genomes and their evolution, just as the original Rosetta Stone allowed Champollion to decipher the ancient Egyptian hieroglyphics. In this review, we consider aspects of the protein domain architectures repertoire that are closely related to those of human languages and aim to provide some insights about the language of proteins.
Collapse
Affiliation(s)
- Andrea Scaiewicz
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305-5126, United States.
| |
Collapse
|
20
|
Glycan complexity dictates microbial resource allocation in the large intestine. Nat Commun 2015; 6:7481. [PMID: 26112186 PMCID: PMC4491172 DOI: 10.1038/ncomms8481] [Citation(s) in RCA: 302] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 05/13/2015] [Indexed: 12/20/2022] Open
Abstract
The structure of the human gut microbiota is controlled primarily through the degradation of complex dietary carbohydrates, but the extent to which carbohydrate breakdown products are shared between members of the microbiota is unclear. We show here, using xylan as a model, that sharing the breakdown products of complex carbohydrates by key members of the microbiota, such as Bacteroides ovatus, is dependent on the complexity of the target glycan. Characterization of the extensive xylan degrading apparatus expressed by B. ovatus reveals that the breakdown of the polysaccharide by the human gut microbiota is significantly more complex than previous models suggested, which were based on the deconstruction of xylans containing limited monosaccharide side chains. Our report presents a highly complex and dynamic xylan degrading apparatus that is fine-tuned to recognize the different forms of the polysaccharide presented to the human gut microbiota. The human gut microbiota helps us to degrade complex dietary carbohydrates such as xylan and, in turn, the carbohydrate breakdown products control the structure of the microbiota. Here the authors characterize the xylan-degrading apparatus of a key member of the gut microbiota, Bacteroides ovatus.
Collapse
|
21
|
Bitard-Feildel T, Kemena C, Greenwood JM, Bornberg-Bauer E. Domain similarity based orthology detection. BMC Bioinformatics 2015; 16:154. [PMID: 25968113 PMCID: PMC4443542 DOI: 10.1186/s12859-015-0570-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 04/10/2015] [Indexed: 11/10/2022] Open
Abstract
Background Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. Results We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. Conclusion We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0570-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| | - Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| | - Jenny M Greenwood
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| |
Collapse
|
22
|
Kemena C, Bitard-Feildel T, Bornberg-Bauer E. MDAT- Aligning multiple domain arrangements. BMC Bioinformatics 2015; 16:19. [PMID: 25626688 PMCID: PMC4384290 DOI: 10.1186/s12859-014-0442-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/03/2014] [Indexed: 11/26/2022] Open
Abstract
Background Proteins are composed of domains, protein segments that fold independently from the rest of the protein and have a specific function. During evolution the arrangement of domains can change: domains are gained, lost or their order is rearranged. To facilitate the analysis of these changes we propose the use of multiple domain alignments. Results We developed an alignment program, called MDAT, which aligns multiple domain arrangements. MDAT extends earlier programs which perform pairwise alignments of domain arrangements. MDAT uses a domain similarity matrix to score domain pairs and aligns the domain arrangements using a consistency supported progressive alignment method. Conclusion MDAT will be useful for analysing changes in domain arrangements within and between protein families and will thus provide valuable insights into the evolution of proteins and their domains. MDAT is coded in C++, and the source code is freely available for download at http://www.bornberglab.org/pages/mdat. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0442-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Carsten Kemena
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| | - Tristan Bitard-Feildel
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstr. 1, Münster, Germany.
| |
Collapse
|
23
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
24
|
Arnold R, Goldenberg F, Mewes HW, Rattei T. SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage. Nucleic Acids Res 2013; 42:D279-84. [PMID: 24165881 PMCID: PMC3965014 DOI: 10.1093/nar/gkt970] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.
Collapse
Affiliation(s)
- Roland Arnold
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Kim Lab, University of Toronto, Toronto, ON M5S 3E1, Canada, CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria and Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85764 Neuherberg, Germany
| | | | | | | |
Collapse
|