151
|
Barcytė D, Zátopková M, Němcová Y, Richtář M, Yurchenko T, Jaške K, Fawley KP, Škaloud P, Ševčíková T, Fawley MW, Eliáš M. Redefining Chlorobotryaceae as one of the principal and most diverse lineages of eustigmatophyte algae. Mol Phylogenet Evol 2022; 177:107607. [PMID: 35963589 DOI: 10.1016/j.ympev.2022.107607] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 07/11/2022] [Accepted: 08/05/2022] [Indexed: 10/15/2022]
Abstract
Eustigmatophyceae is one of the ∼17 classes of the vast algal phylum Ochrophyta. Over the last decade, the eustigmatophytes emerged as an expansive group that has grown from the initially recognized handful of species to well over 200 genetically distinct entities (potential species). Yet the majority of eustigs, remain represented by unidentified strains, or even only metabarcode sequences obtained from environmental samples. Moreover, the formal classification of the group has not yet been harmonized with the recently uncovered diversity and phylogenetic relationships within the class. Here we make a major step towards resolving this issue by addressing the diversity, phylogeny and classification of one of the most prominent eustigmatophyte clades previously informally called the "Eustigmataceae group". We obtained 18S rDNA and rbcL gene sequences from four new strains from the "Eustigmataceae group", and from several additional eustig strains, and performed the most comprehensive phylogenetic analyses of Eustigmatophyceae to date. Our results of these analyses confirm the monophyly of the "Eustigmataceae group" and define its major subclades. We also sequenced plastid genomes of five "Eustigmataceae group" strains to not only improve our understanding of the plastid gene content evolution in eustigs, but also to obtain a robustly resolved eustigmatophyte phylogeny. With this new genomic data, we have solidified the view of the "Eustigmataceae group" as a well-defined family level clade. Crucially, we also have firmly established the genus Chlorobotrys as a member of the "Eustigmataceae group". This new molecular evidence, together with a critical analysis of the literature going back to the 19th century, provided the basis to radically redefine the historical concept of the family Chlorobotryaceae as the formal taxonomic rubric corresponding to the "Eustigmataceae group". With this change, the family names Eustigmataceae and Characiopsidaceae are reduced to synonymy with the Chlorobotryaceae, with the latter having taxonomic priority. We additionally studied in detail the morphology and ultrastructure of two Chlorobotryaceae members, which we describe as Neustupella aerophytica gen. et sp. nov. and Lietzensia polymorpha gen. et sp. nov. Finally, our analyses of partial genomic data from several Chlorobotryaceae representatives identified genes for hallmark flagellar proteins in all of these strains. The presence of the flagellar proteins strongly suggests that zoosporogenesis is a common trait of the family and also occurs in the members never observed to produce flagellated stages. Altogether, our work paints a rich picture of one of the most diverse principal lineages of eustigmatophyte algae.
Collapse
Affiliation(s)
- Dovilė Barcytė
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic.
| | - Martina Zátopková
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic
| | - Yvonne Němcová
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 00 Prague, Czech Republic
| | - Michal Richtář
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic
| | - Tatiana Yurchenko
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic
| | - Karin Jaške
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic
| | - Karen P Fawley
- Division of Science and Mathematics, University of the Ozarks, Clarksville, AR 72830, USA
| | - Pavel Škaloud
- Department of Botany, Faculty of Science, Charles University, Benátská 2, 128 00 Prague, Czech Republic
| | - Tereza Ševčíková
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic
| | - Marvin W Fawley
- Division of Science and Mathematics, University of the Ozarks, Clarksville, AR 72830, USA
| | - Marek Eliáš
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Chittussiho 10, 710 00 Ostrava, Czech Republic.
| |
Collapse
|
152
|
Lauber C, Seitz S. Opportunities and Challenges of Data-Driven Virus Discovery. Biomolecules 2022; 12:biom12081073. [PMID: 36008967 PMCID: PMC9406072 DOI: 10.3390/biom12081073] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 07/30/2022] [Accepted: 08/02/2022] [Indexed: 01/27/2023] Open
Abstract
Virus discovery has been fueled by new technologies ever since the first viruses were discovered at the end of the 19th century. Starting with mechanical devices that provided evidence for virus presence in sick hosts, virus discovery gradually transitioned into a sequence-based scientific discipline, which, nowadays, can characterize virus identity and explore viral diversity at an unprecedented resolution and depth. Sequencing technologies are now being used routinely and at ever-increasing scales, producing an avalanche of novel viral sequences found in a multitude of organisms and environments. In this perspective article, we argue that virus discovery has started to undergo another transformation prompted by the emergence of new approaches that are sequence data-centered and primarily computational, setting them apart from previous technology-driven innovations. The data-driven virus discovery approach is largely uncoupled from the collection and processing of biological samples, and exploits the availability of massive amounts of publicly and freely accessible data from sequencing archives. We discuss open challenges to be solved in order to unlock the full potential of data-driven virus discovery, and we highlight the benefits it can bring to classical (mostly molecular) virology and molecular biology in general.
Collapse
Affiliation(s)
- Chris Lauber
- Institute for Experimental Virology, TWINCORE Centre for Experimental and Clinical Infection Research, a Joint Venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), 30625 Hannover, Germany
- Correspondence:
| | - Stefan Seitz
- Division of Virus-Associated Carcinogenesis (F170), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Department of Infectious Diseases, Molecular Virology, University of Heidelberg, 69120 Heidelberg, Germany
| |
Collapse
|
153
|
Sieg J, Sandmeier CC, Lieske J, Meents A, Lemmen C, Streit WR, Rarey M. Analyzing structural features of proteins from deep-sea organisms. Proteins 2022; 90:1521-1537. [PMID: 35313380 DOI: 10.1002/prot.26337] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/10/2022] [Accepted: 03/15/2022] [Indexed: 12/31/2022]
Abstract
Protein adaptations to extreme environmental conditions are drivers in biotechnological process optimization and essential to unravel the molecular limits of life. Most proteins with such desirable adaptations are found in extremophilic organisms inhabiting extreme environments. The deep sea is such an environment and a promising resource that poses multiple extremes on its inhabitants. Conditions like high hydrostatic pressure and high or low temperature are prevalent and many deep-sea organisms tolerate multiple of these extremes. While molecular adaptations to high temperature are comparatively good described, adaptations to other extremes like high pressure are not well-understood yet. To fully unravel the molecular mechanisms of individual adaptations it is probably necessary to disentangle multifactorial adaptations. In this study, we evaluate differences of protein structures from deep-sea organisms and their respective related proteins from nondeep-sea organisms. We created a data collection of 1281 experimental protein structures from 25 deep-sea organisms and paired them with orthologous proteins. We exhaustively evaluate differences between the protein pairs with machine learning and Shapley values to determine characteristic differences in sequence and structure. The results show a reasonable discrimination of deep-sea and nondeep-sea proteins from which we distinguish correlations previously attributed to thermal stability from other signals potentially describing adaptions to high pressure. While some distinct correlations can be observed the overall picture appears intricate.
Collapse
Affiliation(s)
- Jochen Sieg
- Universität Hamburg, ZBH - Center for Bioinformatics, Hamburg, Germany
| | | | - Julia Lieske
- Deutsches Elektronen-Synchrotron DESY, Center for Free-Electron Laser Science, Hamburg, Germany
| | - Alke Meents
- Deutsches Elektronen-Synchrotron DESY, Center for Free-Electron Laser Science, Hamburg, Germany
| | | | - Wolfgang R Streit
- Universität Hamburg, Department of Microbiology and Biotechnology, Hamburg, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Hamburg, Germany
| |
Collapse
|
154
|
Yang LZ, Gao BQ, Huang Y, Wang Y, Yang L, Chen LL. Multi-color RNA imaging with CRISPR-Cas13b systems in living cells. CELL INSIGHT 2022; 1:100044. [PMID: 37192858 PMCID: PMC10120316 DOI: 10.1016/j.cellin.2022.100044] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/09/2022] [Accepted: 06/10/2022] [Indexed: 05/18/2023]
Abstract
Visualizing RNA dynamics is important for understanding RNA function. Catalytically dead (d) CRISPR-Cas13 systems have been established to image and track RNAs in living cells, but efficient dCas13 for RNA imaging is still limited. Here, we analyzed metagenomic and bacterial genomic databases to comprehensively screen Cas13 homologies for their RNA labeling capabilities in living mammalian cells. Among eight previously unreported dCas13 proteins that can be used for RNA labeling, dHgm4Cas13b and dMisCas13b displayed comparable, if not higher, efficiencies to the best-known ones when targeting endogenous MUC4 and NEAT1_2 by single guide (g) RNAs. Further examination of the labeling robustness of different dCas13 systems using the GCN4 repeats revealed that a minimum of 12 GCN4 repeats was required for dHgm4Cas13b and dMisCas13b imaging at the single RNA molecule level, while >24 GCN4 repeats were required for reported dLwaCas13a, dRfxCas13d and dPguCas13b. Importantly, by silencing pre-crRNA processing activity of dMisCas13b (ddMisCas13b) and further incorporating RNA aptamers including PP7, MS2, Pepper or BoxB to individual gRNAs, a CRISPRpalette system was developed to successfully achieve multi-color RNA visualization in living cells.
Collapse
Affiliation(s)
- Liang-Zhong Yang
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Bao-Qing Gao
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Youkui Huang
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ying Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Li Yang
- Center for Molecular Medicine, Children's Hospital, Fudan University and Shanghai Key Laboratory of Medical Epigenetics, International Laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Ling-Ling Chen
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| |
Collapse
|
155
|
Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun 2022; 13:4348. [PMID: 35896542 PMCID: PMC9329459 DOI: 10.1038/s41467-022-32007-7] [Citation(s) in RCA: 235] [Impact Index Per Article: 78.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/13/2022] [Indexed: 11/29/2022] Open
Abstract
Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Recent progress in Transformer-based architectures has enabled the implementation of language models capable of generating text with human-like capabilities. Here, motivated by this success, we describe ProtGPT2, a language model trained on the protein space that generates de novo protein sequences following the principles of natural ones. The generated proteins display natural amino acid propensities, while disorder predictions indicate that 88% of ProtGPT2-generated proteins are globular, in line with natural sequences. Sensitive sequence searches in protein databases show that ProtGPT2 sequences are distantly related to natural ones, and similarity networks further demonstrate that ProtGPT2 is sampling unexplored regions of protein space. AlphaFold prediction of ProtGPT2-sequences yields well-folded non-idealized structures with embodiments and large loops and reveals topologies not captured in current structure databases. ProtGPT2 generates sequences in a matter of seconds and is freely available.
Collapse
Affiliation(s)
- Noelia Ferruz
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany.
- Institute of Informatics and Applications, University of Girona, Girona, Spain.
| | - Steffen Schmidt
- Computational Biochemistry, University of Bayreuth, 95447, Bayreuth, Germany
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
| |
Collapse
|
156
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
157
|
Bacteria deplete deoxynucleotides to defend against bacteriophage infection. Nat Microbiol 2022; 7:1200-1209. [PMID: 35817891 DOI: 10.1038/s41564-022-01158-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 05/23/2022] [Indexed: 11/09/2022]
Abstract
DNA viruses and retroviruses consume large quantities of deoxynucleotides (dNTPs) when replicating. The human antiviral factor SAMHD1 takes advantage of this vulnerability in the viral lifecycle, and inhibits viral replication by degrading dNTPs into their constituent deoxynucleosides and inorganic phosphate. Here, we report that bacteria use a similar strategy to defend against bacteriophage infection. We identify a family of defensive bacterial deoxycytidine triphosphate (dCTP) deaminase proteins that convert dCTP into deoxyuracil nucleotides in response to phage infection. We also identify a family of phage resistance genes that encode deoxyguanosine triphosphatase (dGTPase) enzymes, which degrade dGTP into phosphate-free deoxyguanosine and are distant homologues of human SAMHD1. Our results suggest that bacterial defensive proteins deplete specific deoxynucleotides (either dCTP or dGTP) from the nucleotide pool during phage infection, thus starving the phage of an essential DNA building block and halting its replication. Our study shows that manipulation of the dNTP pool is a potent antiviral strategy shared by both prokaryotes and eukaryotes.
Collapse
|
158
|
Tomal JH, Welch WJ, Zamar RH. Robust ranking by ensembling of diverse models and assessment metrics. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2093873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Jabed H. Tomal
- Department of Mathematics and Statistics, Thompson Rivers University, Kamloops, British Columbia, Canada
| | - William J. Welch
- Department of Statistics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Ruben H. Zamar
- Department of Statistics, The University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
159
|
Moi D, Nishio S, Li X, Valansi C, Langleib M, Brukman NG, Flyak K, Dessimoz C, de Sanctis D, Tunyasuvunakool K, Jumper J, Graña M, Romero H, Aguilar PS, Jovine L, Podbilewicz B. Discovery of archaeal fusexins homologous to eukaryotic HAP2/GCS1 gamete fusion proteins. Nat Commun 2022; 13:3880. [PMID: 35794124 PMCID: PMC9259645 DOI: 10.1038/s41467-022-31564-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 06/22/2022] [Indexed: 12/26/2022] Open
Abstract
Sexual reproduction consists of genome reduction by meiosis and subsequent gamete fusion. The presence of genes homologous to eukaryotic meiotic genes in archaea and bacteria suggests that DNA repair mechanisms evolved towards meiotic recombination. However, fusogenic proteins resembling those found in gamete fusion in eukaryotes have so far not been found in prokaryotes. Here, we identify archaeal proteins that are homologs of fusexins, a superfamily of fusogens that mediate eukaryotic gamete and somatic cell fusion, as well as virus entry. The crystal structure of a trimeric archaeal fusexin (Fusexin1 or Fsx1) reveals an archetypical fusexin architecture with unique features such as a six-helix bundle and an additional globular domain. Ectopically expressed Fusexin1 can fuse mammalian cells, and this process involves the additional globular domain and a conserved fusion loop. Furthermore, archaeal fusexin genes are found within integrated mobile elements, suggesting potential roles in cell-cell fusion and gene exchange in archaea, as well as different scenarios for the evolutionary history of fusexins.
Collapse
Affiliation(s)
- David Moi
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE-CONICET), Buenos Aires, Argentina
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Shunsuke Nishio
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Xiaohui Li
- Department of Biology, Technion- Israel Institute of Technology, Haifa, Israel
| | - Clari Valansi
- Department of Biology, Technion- Israel Institute of Technology, Haifa, Israel
| | - Mauricio Langleib
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
- Unidad de Bioinformática, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Nicolas G Brukman
- Department of Biology, Technion- Israel Institute of Technology, Haifa, Israel
| | - Kateryna Flyak
- Department of Biology, Technion- Israel Institute of Technology, Haifa, Israel
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Genetics, Evolution and Environment, Centre for Life's Origins and Evolution, University College London, London, UK
- Department of Computer Science, University College London, London, UK
| | | | | | | | - Martin Graña
- Unidad de Bioinformática, Institut Pasteur de Montevideo, Montevideo, Uruguay.
| | - Héctor Romero
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.
- Centro Universitario Regional Este - CURE, Centro Interdisciplinario de Ciencia de Datos y Aprendizaje Automático - CICADA, Universidad de la República, Montevideo, Uruguay.
| | - Pablo S Aguilar
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE-CONICET), Buenos Aires, Argentina.
- Instituto de Investigaciones Biotecnológicas Universidad Nacional de San Martín (IIB-CONICET), San Martín, Buenos Aires, Argentina.
| | - Luca Jovine
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.
| | | |
Collapse
|
160
|
Méheust R, Castelle CJ, Jaffe AL, Banfield JF. Conserved and lineage-specific hypothetical proteins may have played a central role in the rise and diversification of major archaeal groups. BMC Biol 2022; 20:154. [PMID: 35790962 PMCID: PMC9258230 DOI: 10.1186/s12915-022-01348-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/09/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Archaea play fundamental roles in the environment, for example by methane production and consumption, ammonia oxidation, protein degradation, carbon compound turnover, and sulfur compound transformations. Recent genomic analyses have profoundly reshaped our understanding of the distribution and functionalities of Archaea and their roles in eukaryotic evolution. RESULTS Here, 1179 representative genomes were selected from 3197 archaeal genomes. The representative genomes clustered based on the content of 10,866 newly defined archaeal protein families (that will serve as a community resource) recapitulates archaeal phylogeny. We identified the co-occurring proteins that distinguish the major lineages. Those with metabolic roles were consistent with experimental data. However, two families specific to Asgard were determined to be new eukaryotic signature proteins. Overall, the blocks of lineage-specific families are dominated by proteins that lack functional predictions. CONCLUSIONS Given that these hypothetical proteins are near ubiquitous within major archaeal groups, we propose that they were important in the origin of most of the major archaeal lineages. Interestingly, although there were clearly phylum-specific co-occurring proteins, no such blocks of protein families were shared across superphyla, suggesting a burst-like origin of new lineages early in archaeal evolution.
Collapse
Affiliation(s)
- Raphaël Méheust
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. .,Innovative Genomics Institute, University of California, Berkeley, CA, USA. .,LABGeM, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, Evry, France.
| | - Cindy J Castelle
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Alexander L Jaffe
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA. .,Innovative Genomics Institute, University of California, Berkeley, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA. .,Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA.
| |
Collapse
|
161
|
Chu Y, Zhao Z, Cai L, Zhang G. Viral diversity and biogeochemical potential revealed in different prawn-culture sediments by virus-enriched metagenome analysis. ENVIRONMENTAL RESEARCH 2022; 210:112901. [PMID: 35227678 DOI: 10.1016/j.envres.2022.112901] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 02/01/2022] [Accepted: 02/03/2022] [Indexed: 06/14/2023]
Abstract
As the most numerous biological entities on Earth, viruses affect the microbial dynamics, metabolism and biogeochemical cycles in the aquatic ecosystems. Viral diversity and functions in ocean have been relatively well studied, but our understanding of viruses in mariculture systems is limited. To fill this knowledge gap, we studied viral diversity and potential biogeochemical impacts of sediments from four different prawn-mariculture ecosystems (mono-culture of prawn and poly-culture of prawn with jellyfish, sea cucumber, and clam) using a metagenomic approach with prior virus-like particles (VLPs) separation. We found that the order Caudovirales was the predominant viral category and accounted for the most volume (78.39% of classified viruses). Sediment viruses were verified to have a high diversity by using the construct phylogenetic tree of terL gene, with three potential novel clades being identified. Meanwhile, compared with viruses inhabiting other ecosystems based on gene-sharing network, our results revealed that mariculture sediments harbored considerable unexplored viral diversity and that maricultural species were potentially important drivers of the viral community structure. Notably, viral auxiliary metabolic genes were identified and suggested that viruses influence carbon and sulfur cycling, as well as cofactors/vitamins and amino acid metabolism, which indirectly participate in biogeochemical cycling. Overall, our findings revealed the genomic diversity and ecological function of viral communities in prawn mariculture sediments, and suggested the role of viruses in microbial ecology and biogeochemistry.
Collapse
Affiliation(s)
- Yunmeng Chu
- Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, 361021, Fujian, China
| | - Zelong Zhao
- Shanghai BIOZERON Biotechnology Co., Ltd., Shanghai, 201800, China
| | - Lixi Cai
- Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, 361021, Fujian, China; Faculty of Basic Medicine, Putian University, Putian, 351100, Fujian, China
| | - Guangya Zhang
- Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, 361021, Fujian, China.
| |
Collapse
|
162
|
S51 Family Peptidases Provide Resistance to Peptidyl-Nucleotide Antibiotic McC. mBio 2022; 13:e0080522. [PMID: 35467414 PMCID: PMC9239234 DOI: 10.1128/mbio.00805-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microcin C (McC)-like compounds are natural Trojan horse peptide-nucleotide antibiotics produced by diverse bacteria. The ribosomally synthesized peptide parts of these antibiotics are responsible for their facilitated transport into susceptible cells. Once inside the cell, the peptide part is degraded, releasing the toxic payload, an isoaspartyl-nucleotide that inhibits aspartyl-tRNA synthetase, an enzyme essential for protein synthesis. Bacteria that produce microcin C-like compounds have evolved multiple ways to avoid self-intoxication. Here, we describe a new strategy through the action of S51 family peptidases, which we name MccG. MccG cleaves the toxic isoaspartyl-nucleotide, rendering it inactive. While some MccG homologs are encoded by gene clusters responsible for biosynthesis of McC-like compounds, most are encoded by standalone genes whose products may provide a basal level of resistance to peptide-nucleotide antibiotics in phylogenetically distant bacteria.
Collapse
|
163
|
Pereira J, Lupas AN. New β-Propellers Are Continuously Amplified From Single Blades in all Major Lineages of the β-Propeller Superfamily. Front Mol Biosci 2022; 9:895496. [PMID: 35755816 PMCID: PMC9218822 DOI: 10.3389/fmolb.2022.895496] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 05/13/2022] [Indexed: 11/13/2022] Open
Abstract
β-Propellers are toroidal folds, in which consecutive supersecondary structure units of four anti-parallel β-strands-called blades-are arranged radially around a central axis. Uniquely among toroidal folds, blades span the full range of sequence symmetry, from near identity to complete divergence, indicating an ongoing process of amplification and differentiation. We have proposed that the major lineages of β-propellers arose through this mechanism and that therefore their last common ancestor was a single blade, not a fully formed β-propeller. Here we show that this process of amplification and differentiation is also widespread within individual lineages, yielding β-propellers with blades of more than 60% pairwise sequence identity in most major β-propeller families. In some cases, the blades are nearly identical, indicating a very recent amplification event, but even in cases where such recently amplified β-propellers have more than 80% overall sequence identity to each other, comparison of their DNA sequence shows that the amplification occurred independently.
Collapse
Affiliation(s)
- Joana Pereira
- Department of Protein Evolution, Max Planck Institute for Biology, Tübingen, Germany
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Biology, Tübingen, Germany
| |
Collapse
|
164
|
Jang J, Reed PMM, Rauscher S, Woolley GA. Point (S-to-G) Mutations in the W(S/G)GE Motif in Red/Green Cyanobacteriochrome GAF Domains Enhance Thermal Reversion Rates. Biochemistry 2022; 61:1444-1455. [PMID: 35759789 DOI: 10.1021/acs.biochem.2c00060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Cyanobacteriochromes (CBCRs) are photoreceptors consisting of single or tandem GAF (cGMP-phosphodiesterase/adenylate cyclase/FhlA) domains that bind bilin chromophores. Canonical red/green CBCR GAF domains are a well-characterized subgroup of the expanded red/green CBCR GAF domain family that binds phycocyanobilin (PCB) and converts between a thermally stable red-absorbing Pr state and a green-absorbing Pg state. The rate of thermal reversion from Pg to Pr varies widely among canonical red/green CBCR GAF domains, with half-lives ranging from days to seconds. Since the thermal reversion rate is an important parameter for the application of CBCR GAF domains as optogenetic tools, the molecular factors controlling the thermal reversion rate are of particular interest. Here, we report that point mutations in a well-conserved W(S/G)GE motif alter reversion rates in canonical red/green CBCR GAF domains in a predictable manner. Specifically, S-to-G mutations enhance thermal reversion rates, while the reverse, G-to-S mutations slow thermal reversion. Despite the distance (>10 Å) of the mutation site from the chromophore, molecular dynamics simulations and nuclear magnetic resonance (NMR) analyses suggest that the presence of a glycine residue allows the formation of a water bridge that alters the conformational dynamics of chromophore-interacting residues, leading to enhanced Pg to Pr thermal reversion.
Collapse
Affiliation(s)
- Jaewan Jang
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - P Maximilian M Reed
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Sarah Rauscher
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada.,Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada.,Department of Physics, University of Toronto, 60 St. George Street, Toronto, Ontario, M5S 1A7, Canada
| | - G Andrew Woolley
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| |
Collapse
|
165
|
|
166
|
Schauperl M, Denny RA. AI-Based Protein Structure Prediction in Drug Discovery: Impacts and Challenges. J Chem Inf Model 2022; 62:3142-3156. [PMID: 35727311 DOI: 10.1021/acs.jcim.2c00026] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Proteins are the molecular machinery of the human body, and their malfunctioning is often responsible for diseases, making them crucial targets for drug discovery. The three-dimensional structure of a protein determines its biological function, its conformational state determines substrates, cofactors, and protein binding. Rational drug discovery employs engineered small molecules to selectively interact with proteins to modulate their function. To selectively target a protein and to design small molecules, knowing the protein structure with all its specific conformation is critical. Unfortunately, for a large number of proteins relevant for drug discovery, the three-dimensional structure has not yet been experimentally solved. Therefore, accurately predicting their structure based on their amino acid sequence is one of the grant challenges in biology. Recently, AlphaFold2, a machine learning application based on a deep neural network, was able to predict unknown structures of proteins with an unprecedented accuracy. Despite the impressive progress made by AlphaFold2, nature still challenges the field of structure prediction. In this Perspective, we explore how AlphaFold2 and related methods help make drug design more efficient. Furthermore, we discuss the roles of predicting domain-domain orientations, all relevant conformational states, the influence of posttranslational modifications, and conformational changes due to protein binding partners. We highlight where further improvements are needed for advanced machine learning methods to be successfully and frequently used in the pharmaceutical industry.
Collapse
Affiliation(s)
- Michael Schauperl
- Department of Computational Sciences HotSpot Therapeutics 50 Milk Street, Boston, Massachusetts 02110, United States
| | - Rajiah Aldrin Denny
- Department of Computational Sciences HotSpot Therapeutics 50 Milk Street, Boston, Massachusetts 02110, United States
| |
Collapse
|
167
|
Abstract
Subcellular compartmentalization is a defining feature of all cells. In prokaryotes, compartmentalization is generally achieved via protein-based strategies. The two main classes of microbial protein compartments are bacterial microcompartments and encapsulin nanocompartments. Encapsulins self-assemble into proteinaceous shells with diameters between 24 and 42 nm and are defined by the viral HK97-fold of their shell protein. Encapsulins have the ability to encapsulate dedicated cargo proteins, including ferritin-like proteins, peroxidases, and desulfurases. Encapsulation is mediated by targeting sequences present in all cargo proteins. Encapsulins are found in many bacterial and archaeal phyla and have been suggested to play roles in iron storage, stress resistance, sulfur metabolism, and natural product biosynthesis. Phylogenetic analyses indicate that they share a common ancestor with viral capsid proteins. Many pathogens encode encapsulins, and recent evidence suggests that they may contribute toward pathogenicity. The existing information on encapsulin structure, biochemistry, biological function, and biomedical relevance is reviewed here.
Collapse
Affiliation(s)
- Tobias W. Giessen
- Departments of Biomedical Engineering and Biological Chemistry, University of Michigan Medical School, Ann Arbor, Michigan, USA
| |
Collapse
|
168
|
Nishimura Y, Yoshizawa S. The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments. Sci Data 2022; 9:305. [PMID: 35715423 PMCID: PMC9205870 DOI: 10.1038/s41597-022-01392-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 05/12/2022] [Indexed: 12/22/2022] Open
Abstract
Marine microorganisms are immensely diverse and play fundamental roles in global geochemical cycling. Recent metagenome-assembled genome studies, with particular attention to large-scale projects such as Tara Oceans, have expanded the genomic repertoire of marine microorganisms. However, published marine metagenome data is still underexplored. We collected 2,057 marine metagenomes covering various marine environments and developed a new genome reconstruction pipeline. We reconstructed 52,325 qualified genomes composed of 8,466 prokaryotic species-level clusters spanning 59 phyla, including genomes from the deep-sea characterized as deeper than 1,000 m (n = 3,337), low-oxygen zones of <90 μmol O2 per kg water (n = 7,884), and polar regions (n = 7,752). Novelty evaluation using a genome taxonomy database shows that 6,256 species (73.9%) are novel and include genomes of high taxonomic novelty, such as new class candidates. These genomes collectively expanded the known phylogenetic diversity of marine prokaryotes by 34.2%, and the species representatives cover 26.5-42.0% of prokaryote-enriched metagenomes. Thoroughly leveraging accumulated metagenomic data, this genome resource, named the OceanDNA MAG catalog, illuminates uncharacterized marine microbial 'dark matter' lineages.
Collapse
Affiliation(s)
- Yosuke Nishimura
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, 277-8564, Japan.
- Research Center for Bioscience and Nanoscience (CeBN), Research Institute for Marine Resources Utilization, Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa, 237-0061, Japan.
| | - Susumu Yoshizawa
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, 277-8564, Japan
- Graduate School of Frontier Sciences, The University of Tokyo, Chiba, 277-8563, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo, 113-8657, Japan
| |
Collapse
|
169
|
Ibarra-Laclette E, Venancio-Rodríguez CA, Vásquez-Aguilar AA, Alonso-Sánchez AG, Pérez-Torres CA, Villafán E, Ramírez-Barahona S, Galicia S, Sosa V, Rebollar EA, Lara C, González-Rodríguez A, Díaz-Fleisher F, Ornelas JF. Transcriptional Basis for Haustorium Formation and Host Establishment in Hemiparasitic Psittacanthus schiedeanus Mistletoes. Front Genet 2022; 13:929490. [PMID: 35769994 PMCID: PMC9235361 DOI: 10.3389/fgene.2022.929490] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
The mistletoe Psittacanthus schiedeanus, a keystone species in interaction networks between plants, pollinators, and seed dispersers, infects a wide range of native and non-native tree species of commercial interest. Here, using RNA-seq methodology we assembled the whole circularized quadripartite structure of P. schiedeanus chloroplast genome and described changes in the gene expression of the nuclear genomes across time of experimentally inoculated seeds. Of the 140,467 assembled and annotated uniGenes, 2,000 were identified as differentially expressed (DEGs) and were classified in six distinct clusters according to their expression profiles. DEGs were also classified in enriched functional categories related to synthesis, signaling, homoeostasis, and response to auxin and jasmonic acid. Since many orthologs are involved in lateral or adventitious root formation in other plant species, we propose that in P. schiedeanus (and perhaps in other rootless mistletoe species), these genes participate in haustorium formation by complex regulatory networks here described. Lastly, and according to the structural similarities of P. schiedeanus enzymes with those that are involved in host cell wall degradation in fungi, we suggest that a similar enzymatic arsenal is secreted extracellularly and used by mistletoes species to easily parasitize and break through tissues of the host.
Collapse
Affiliation(s)
- Enrique Ibarra-Laclette
- Instituto de Ecología A.C. (INECOL), Red de Estudios Moleculares Avanzados (REMAv), Xalapa, Mexico
| | | | | | | | - Claudia-Anahí Pérez-Torres
- Instituto de Ecología A.C. (INECOL), Red de Estudios Moleculares Avanzados (REMAv), Xalapa, Mexico
- Investigador por Mexico-CONACyT en el Instituto de Ecología A.C. (INECOL), Xalapa, Mexico
| | - Emanuel Villafán
- Instituto de Ecología A.C. (INECOL), Red de Estudios Moleculares Avanzados (REMAv), Xalapa, Mexico
| | - Santiago Ramírez-Barahona
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de Mexico (UNAM), Ciudad de Mexico, Mexico
| | - Sonia Galicia
- Instituto de Ecología A.C. (INECOL), Red de Biología Evolutiva, Xalapa, Mexico
| | - Victoria Sosa
- Instituto de Ecología A.C. (INECOL), Red de Biología Evolutiva, Xalapa, Mexico
| | - Eria A. Rebollar
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de Mexico, Cuernavaca, Mexico
| | - Carlos Lara
- Centro de Investigación en Ciencias Biológicas, Universidad Autónoma de Tlaxcala, Tlaxcala, Mexico
| | - Antonio González-Rodríguez
- Laboratorio de Genética de la Conservación, Instituto de Investigaciones en Ecosistemas y Sustentabilidad (IIES), UNAM, Morelia, Mexico
| | | | | |
Collapse
|
170
|
Lee SJ, Joo K, Sim S, Lee J, Lee IH, Lee J. CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields. Molecules 2022; 27:3711. [PMID: 35744836 PMCID: PMC9231382 DOI: 10.3390/molecules27123711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/16/2022] Open
Abstract
Sequence-structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence-structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence-structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence-structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.
Collapse
Affiliation(s)
- Sung Jong Lee
- Basic Science Institute, Changwon National University, Changwon 51140, Korea;
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea;
| | | | - Juyong Lee
- Department of Chemistry, Kangwon National University, Chuncheon 24341, Korea;
| | - In-Ho Lee
- Korea Research Institute of Standards and Science (KRISS), Daejeon 34113, Korea;
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
171
|
Complete Genome Sequence of a Phage Infecting Sphingomonadaceae. Microbiol Resour Announc 2022; 11:e0036622. [PMID: 35652632 PMCID: PMC9302152 DOI: 10.1128/mra.00366-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We isolated a phage infecting a member of the Sphingomonadaceae family from a freshwater lake. The phage has a DNA genome of 41,771 bp, with a GC content of 61.7%. The genome harbors 50 predicted protein-coding genes and an auxiliary metabolic gene, which encodes a protein belonging to the radical S-adenosylmethionine superfamily.
Collapse
|
172
|
Borges AL, Lou YC, Sachdeva R, Al-Shayeb B, Penev PI, Jaffe AL, Lei S, Santini JM, Banfield JF. Widespread stop-codon recoding in bacteriophages may regulate translation of lytic genes. Nat Microbiol 2022; 7:918-927. [PMID: 35618772 PMCID: PMC9197471 DOI: 10.1038/s41564-022-01128-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 04/21/2022] [Indexed: 11/09/2022]
Abstract
Bacteriophages (phages) are obligate parasites that use host bacterial translation machinery to produce viral proteins. However, some phages have alternative genetic codes with reassigned stop codons that are predicted to be incompatible with bacterial translation systems. We analysed 9,422 phage genomes and found that stop-codon recoding has evolved in diverse clades of phages that infect bacteria present in both human and animal gut microbiota. Recoded stop codons are particularly over-represented in phage structural and lysis genes. We propose that recoded stop codons might function to prevent premature production of late-stage proteins. Stop-codon recoding has evolved several times in closely related lineages, which suggests that adaptive recoding can occur over very short evolutionary timescales.
Collapse
Affiliation(s)
- Adair L Borges
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Environmental Science, Policy and Management, University of California, Berkeley, CA, USA
| | - Yue Clare Lou
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Rohan Sachdeva
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Earth and Planetary Science, University of California, Berkeley, CA, USA
| | - Basem Al-Shayeb
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Petar I Penev
- Earth and Planetary Science, University of California, Berkeley, CA, USA
| | - Alexander L Jaffe
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Shufei Lei
- Earth and Planetary Science, University of California, Berkeley, CA, USA
| | - Joanne M Santini
- Department of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| | - Jillian F Banfield
- Innovative Genomics Institute, University of California, Berkeley, CA, USA.
- Environmental Science, Policy and Management, University of California, Berkeley, CA, USA.
- Earth and Planetary Science, University of California, Berkeley, CA, USA.
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
173
|
Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022; 4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open
Abstract
Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching, Germany & TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
174
|
Bileschi ML, Belanger D, Bryant DH, Sanderson T, Carter B, Sculley D, Bateman A, DePristo MA, Colwell LJ. Using deep learning to annotate the protein universe. Nat Biotechnol 2022; 40:932-937. [PMID: 35190689 DOI: 10.1038/s41587-021-01179-w] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/02/2021] [Indexed: 12/30/2022]
Abstract
Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.
Collapse
Affiliation(s)
| | | | | | - Theo Sanderson
- Google Research, Cambridge, MA, USA
- The Francis Crick Institute, London, UK
| | - Brandon Carter
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | | | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Mark A DePristo
- Google Research, Cambridge, MA, USA
- BigHat Biosciences, San Mateo, CA, USA
| | - Lucy J Colwell
- Google Research, Cambridge, MA, USA.
- Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
175
|
Lin L, Capozzoli R, Ferrand A, Plum M, Vettiger A, Basler M. Subcellular localization of Type VI secretion system assembly in response to cell–cell contact. EMBO J 2022; 41:e108595. [PMID: 35634969 PMCID: PMC9251886 DOI: 10.15252/embj.2021108595] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 04/18/2022] [Accepted: 04/29/2022] [Indexed: 11/13/2022] Open
Abstract
Bacteria require a number of systems, including the type VI secretion system (T6SS), for interbacterial competition and pathogenesis. The T6SS is a large nanomachine that can deliver toxins directly across membranes of proximal target cells. Since major reassembly of T6SS is necessary after each secretion event, accurate timing and localization of T6SS assembly can lower the cost of protein translocation. Although critically important, mechanisms underlying spatiotemporal regulation of T6SS assembly remain poorly understood. Here, we used super‐resolution live‐cell imaging to show that while Acinetobacter and Burkholderia thailandensis can assemble T6SS at any site, a significant subset of T6SS assemblies localizes precisely to the site of contact between neighboring bacteria. We identified a class of diverse, previously uncharacterized, periplasmic proteins required for this dynamic localization of T6SS to cell–cell contact (TslA). This precise localization is also dependent on the outer membrane porin OmpA. Our analysis links transmembrane communication to accurate timing and localization of T6SS assembly as well as uncovers a pathway allowing bacterial cells to respond to cell–cell contact during interbacterial competition.
Collapse
Affiliation(s)
- Lin Lin
- Biozentrum University of Basel Basel Switzerland
| | | | - Alexia Ferrand
- Biozentrum Imaging Core Facility University of Basel Basel Switzerland
| | - Miro Plum
- Biozentrum University of Basel Basel Switzerland
| | | | - Marek Basler
- Biozentrum University of Basel Basel Switzerland
| |
Collapse
|
176
|
León-González JA, Flatet P, Juárez-Ramírez MS, Farías-Rico JA. Folding and Evolution of a Repeat Protein on the Ribosome. Front Mol Biosci 2022; 9:851038. [PMID: 35707224 PMCID: PMC9189291 DOI: 10.3389/fmolb.2022.851038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 04/27/2022] [Indexed: 12/04/2022] Open
Abstract
Life on earth is the result of the work of proteins, the cellular nanomachines that fold into elaborated 3D structures to perform their functions. The ribosome synthesizes all the proteins of the biosphere, and many of them begin to fold during translation in a process known as cotranslational folding. In this work we discuss current advances of this field and provide computational and experimental data that highlight the role of ribosome in the evolution of protein structures. First, we used the sequence of the Ankyrin domain from the Drosophila Notch receptor to launch a deep sequence-based search. With this strategy, we found a conserved 33-residue motif shared by different protein folds. Then, to see how the vectorial addition of the motif would generate a full structure we measured the folding on the ribosome of the Ankyrin repeat protein. Not only the on-ribosome folding data is in full agreement with classical in vitro biophysical measurements but also it provides experimental evidence on how folded proteins could have evolved by duplication and fusion of smaller fragments in the RNA world. Overall, we discuss how the ribosomal exit tunnel could be conceptualized as an active site that is under evolutionary pressure to influence protein folding.
Collapse
Affiliation(s)
- José Alberto León-González
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | - Perline Flatet
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - María Soledad Juárez-Ramírez
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | - José Arcadio Farías-Rico
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
- *Correspondence: José Arcadio Farías-Rico,
| |
Collapse
|
177
|
Genome-Wide Identification, Classification, Expression and Duplication Analysis of bZIP Family Genes in Juglans regia L. Int J Mol Sci 2022; 23:ijms23115961. [PMID: 35682645 PMCID: PMC9180593 DOI: 10.3390/ijms23115961] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 05/21/2022] [Accepted: 05/24/2022] [Indexed: 01/08/2023] Open
Abstract
Basic leucine zipper (bZIP), a conserved transcription factor widely found in eukaryotes, has important regulatory roles in plant growth. To understand the information related to the bZIP gene family in walnut, 88 JrbZIP genes were identified at the genome-wide level and classified into 13 subfamilies (A, B, C, D, E, F, G, H, I, J, K, M, and S) using a bioinformatic approach. The number of exons in JrbZIPs ranged from 1 to 12, the number of amino acids in JrbZIP proteins ranged from 145 to 783, and the isoelectric point ranged from 4.85 to 10.05. The majority of JrbZIP genes were localized in the nucleus. The promoter prediction results indicated that the walnut bZIP gene contains a large number of light-responsive and jasmonate-responsive action elements. The 88 JrbZIP genes were involved in DNA binding and nucleus and RNA biosynthetic processes of three ontological categories, molecular functions, cellular components and biological processes. The codon preference analysis showed that the bZIP gene family has a stronger bias for AGA, AGG, UUG, GCU, GUU, and UCU than other codons. Moreover, the transcriptomic data showed that JrbZIP genes might play an important role in floral bud differentiation. The results of a protein interaction network map and kegg enrichment analysis indicated that bZIP genes were mainly involved in phytohormone signaling, anthocyanin synthesis and flowering regulation. qRT-PCR demonstrated the role of the bZIP gene family in floral bud differentiation. Co-expression network maps were constructed for 29 walnut bZIP genes and 6 flowering genes, and JrCO (a homolog of AtCO) was significantly correlated (p < 0.05) with 13 JrbZIP genes in the level of floral bud differentiation expression, including JrbZIP31 (homolog of AtFD), and JrLFY was significantly and positively correlated with JrbZIP10,11,51,59,67 (p < 0.05), and the above results suggest that bZIP family genes may act together with flowering genes to regulate flower bud differentiation in walnut. This study was the first genome-wide report of the walnut bZIP gene family, which could improve our understanding of walnut bZIP proteins and provide a solid foundation for future cloning and functional analyses of this gene family.
Collapse
|
178
|
Yan J, Jiang T, Liu J, Lu Y, Guan S, Li H, Wu H, Ding Y. DNA-binding protein prediction based on deep transfer learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:7719-7736. [PMID: 35801442 DOI: 10.3934/mbe.2022362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The study of DNA binding proteins (DBPs) is of great importance in the biomedical field and plays a key role in this field. At present, many researchers are working on the prediction and detection of DBPs. Traditional DBP prediction mainly uses machine learning methods. Although these methods can obtain relatively high pre-diction accuracy, they consume large quantities of human effort and material resources. Transfer learning has certain advantages in dealing with such prediction problems. Therefore, in the present study, two features were extracted from a protein sequence, a transfer learning method was used, and two classical transfer learning algorithms were compared to transfer samples and construct data sets. In the final step, DBPs are detected by building a deep learning neural network model in a way that uses attention mechanisms.
Collapse
Affiliation(s)
- Jun Yan
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Tengsheng Jiang
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Junkai Liu
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Yaoyao Lu
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Shixuan Guan
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Haiou Li
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Hongjie Wu
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
- Suzhou Smart City Research Institute, Suzhou University of Science and Technology, Suzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
179
|
Simpkin AJ, Thomas JMH, Keegan RM, Rigden DJ. MrParse: finding homologues in the PDB and the EBI AlphaFold database for molecular replacement and more. Acta Crystallogr D Struct Biol 2022; 78:553-559. [PMID: 35503204 PMCID: PMC9063843 DOI: 10.1107/s2059798322003576] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 03/29/2022] [Indexed: 11/10/2022] Open
Abstract
Crystallographers have an array of search-model options for structure solution by molecular replacement (MR). The well established options of homologous experimental structures and regular secondary-structure elements or motifs are increasingly supplemented by computational modelling. Such modelling may be carried out locally or may use pre-calculated predictions retrieved from databases such as the EBI AlphaFold database. MrParse is a new pipeline to help to streamline the decision process in MR by consolidating bioinformatic predictions in one place. When reflection data are provided, MrParse can rank any experimental homologues found using eLLG, which indicates the likelihood that a given search model will work in MR. Inbuilt displays of predicted secondary structure, coiled-coil and transmembrane regions further inform the choice of MR protocol. MrParse can also identify and rank homologues in the EBI AlphaFold database, a function that will also interest other structural biologists and bioinformaticians.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Jens M. H. Thomas
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Ronan M. Keegan
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| |
Collapse
|
180
|
Villegas-Morcillo A, Gomez AM, Sanchez V. An analysis of protein language model embeddings for fold prediction. Brief Bioinform 2022; 23:6571527. [PMID: 35443054 DOI: 10.1093/bib/bbac142] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/21/2022] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
The identification of the protein fold class is a challenging problem in structural biology. Recent computational methods for fold prediction leverage deep learning techniques to extract protein fold-representative embeddings mainly using evolutionary information in the form of multiple sequence alignment (MSA) as input source. In contrast, protein language models (LM) have reshaped the field thanks to their ability to learn efficient protein representations (protein-LM embeddings) from purely sequential information in a self-supervised manner. In this paper, we analyze a framework for protein fold prediction using pre-trained protein-LM embeddings as input to several fine-tuning neural network models, which are supervisedly trained with fold labels. In particular, we compare the performance of six protein-LM embeddings: the long short-term memory-based UniRep and SeqVec, and the transformer-based ESM-1b, ESM-MSA, ProtBERT and ProtT5; as well as three neural networks: Multi-Layer Perceptron, ResCNN-BGRU (RBG) and Light-Attention (LAT). We separately evaluated the pairwise fold recognition (PFR) and direct fold classification (DFC) tasks on well-known benchmark datasets. The results indicate that the combination of transformer-based embeddings, particularly those obtained at amino acid level, with the RBG and LAT fine-tuning models performs remarkably well in both tasks. To further increase prediction accuracy, we propose several ensemble strategies for PFR and DFC, which provide a significant performance boost over the current state-of-the-art results. All this suggests that moving from traditional protein representations to protein-LM embeddings is a very promising approach to protein fold-related tasks.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| |
Collapse
|
181
|
Zheng W, Wuyun Q, Zhou X, Li Y, Freddolino PL, Zhang Y. LOMETS3: integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res 2022; 50:W454-W464. [PMID: 35420129 PMCID: PMC9252734 DOI: 10.1093/nar/gkac248] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 11/25/2022] Open
Abstract
Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
182
|
McGreig JE, Uri H, Antczak M, Sternberg MJE, Michaelis M, Wass MN. 3DLigandSite: structure-based prediction of protein-ligand binding sites. Nucleic Acids Res 2022; 50:W13-W20. [PMID: 35412635 PMCID: PMC9252821 DOI: 10.1093/nar/gkac250] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/13/2022] [Accepted: 04/03/2022] [Indexed: 01/13/2023] Open
Abstract
3DLigandSite is a web tool for the prediction of ligand-binding sites in proteins. Here, we report a significant update since the first release of 3DLigandSite in 2010. The overall methodology remains the same, with candidate binding sites in proteins inferred using known binding sites in related protein structures as templates. However, the initial structural modelling step now uses the newly available structures from the AlphaFold database or alternatively Phyre2 when AlphaFold structures are not available. Further, a sequence-based search using HHSearch has been introduced to identify template structures with bound ligands that are used to infer the ligand-binding residues in the query protein. Finally, we introduced a machine learning element as the final prediction step, which improves the accuracy of predictions and provides a confidence score for each residue predicted to be part of a binding site. Validation of 3DLigandSite on a set of 6416 binding sites obtained 92% recall at 75% precision for non-metal binding sites and 52% recall at 75% precision for metal binding sites. 3DLigandSite is available at https://www.wass-michaelislab.org/3dligandsite. Users submit either a protein sequence or structure. Results are displayed in multiple formats including an interactive Mol* molecular visualization of the protein and the predicted binding sites.
Collapse
Affiliation(s)
- Jake E McGreig
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Hannah Uri
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Magdalena Antczak
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Martin Michaelis
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Mark N Wass
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| |
Collapse
|
183
|
Chao J, Tang F, Xu L. Developments in Algorithms for Sequence Alignment: A Review. Biomolecules 2022; 12:biom12040546. [PMID: 35454135 PMCID: PMC9024764 DOI: 10.3390/biom12040546] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 01/27/2023] Open
Abstract
The continuous development of sequencing technologies has enabled researchers to obtain large amounts of biological sequence data, and this has resulted in increasing demands for software that can perform sequence alignment fast and accurately. A number of algorithms and tools for sequence alignment have been designed to meet the various needs of biologists. Here, the ideas that prevail in the research of sequence alignment and some quality estimation methods for multiple sequence alignment tools are summarized.
Collapse
Affiliation(s)
- Jiannan Chao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China;
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China;
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
- Correspondence:
| |
Collapse
|
184
|
Malladi S, Powell HR, David A, Islam SA, Copeland MM, Kundrotas PJ, Sternberg MJ, Vakser IA. GWYRE: A resource for mapping variants onto experimental and modeled structures of human protein complexes. J Mol Biol 2022; 434:167608. [PMID: 35662458 PMCID: PMC9188266 DOI: 10.1016/j.jmb.2022.167608] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/31/2022] [Accepted: 04/20/2022] [Indexed: 02/08/2023]
Abstract
Structure of protein complexes is important for interpreting genetic variation. Data on single amino acid variants is available from high-throughput sequencing. Integrated modeling approach was applied to proteins and their complexes. GWYRE resource incorporates predicted protein complexes with mapped mutations.
Rapid progress in structural modeling of proteins and their interactions is powered by advances in knowledge-based methodologies along with better understanding of physical principles of protein structure and function. The pool of structural data for modeling of proteins and protein–protein complexes is constantly increasing due to the rapid growth of protein interaction databases and Protein Data Bank. The GWYRE (Genome Wide PhYRE) project capitalizes on these developments by advancing and applying new powerful modeling methodologies to structural modeling of protein–protein interactions and genetic variation. The methods integrate knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by a full-structure alignment protocol to generate models for binary complexes. The predictions are incorporated in a comprehensive public resource for structural characterization of the human interactome and the location of human genetic variants. The GWYRE resource facilitates better understanding of principles of protein interaction and structure/function relationships. The resource is available at http://www.gwyre.org.
Collapse
|
185
|
Vishwakarma P, Vattekatte AM, Shinada N, Diharce J, Martins C, Cadet F, Gardebien F, Etchebest C, Nadaradjane AA, de Brevern AG. V HH Structural Modelling Approaches: A Critical Review. Int J Mol Sci 2022; 23:3721. [PMID: 35409081 PMCID: PMC8998791 DOI: 10.3390/ijms23073721] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/20/2022] Open
Abstract
VHH, i.e., VH domains of camelid single-chain antibodies, are very promising therapeutic agents due to their significant physicochemical advantages compared to classical mammalian antibodies. The number of experimentally solved VHH structures has significantly improved recently, which is of great help, because it offers the ability to directly work on 3D structures to humanise or improve them. Unfortunately, most VHHs do not have 3D structures. Thus, it is essential to find alternative ways to get structural information. The methods of structure prediction from the primary amino acid sequence appear essential to bypass this limitation. This review presents the most extensive overview of structure prediction methods applied for the 3D modelling of a given VHH sequence (a total of 21). Besides the historical overview, it aims at showing how model software programs have been shaping the structural predictions of VHHs. A brief explanation of each methodology is supplied, and pertinent examples of their usage are provided. Finally, we present a structure prediction case study of a recently solved VHH structure. According to some recent studies and the present analysis, AlphaFold 2 and NanoNet appear to be the best tools to predict a structural model of VHH from its sequence.
Collapse
Affiliation(s)
- Poonam Vishwakarma
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Akhila Melarkode Vattekatte
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | | | - Julien Diharce
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
| | - Carla Martins
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Frédéric Cadet
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
- PEACCEL, Artificial Intelligence Department, Square Albin Cachot, F-75013 Paris, France
| | - Fabrice Gardebien
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Catherine Etchebest
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
| | - Aravindan Arun Nadaradjane
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| | - Alexandre G. de Brevern
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-75015 Paris, France; (P.V.); (A.M.V.); (J.D.); (C.M.); (C.E.); (A.A.N.)
- INSERM UMR_S 1134, BIGR, DSIMB Team, Université de Paris and Université de la Réunion, F-97715 Saint Denis Messag, France; (F.C.); (F.G.)
| |
Collapse
|
186
|
Kabir MN, Wong L. EnsembleFam: towards more accurate protein family prediction in the twilight zone. BMC Bioinformatics 2022; 23:90. [PMID: 35287576 PMCID: PMC8919565 DOI: 10.1186/s12859-022-04626-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 03/02/2022] [Indexed: 11/30/2022] Open
Abstract
Background Current protein family modeling methods like profile Hidden Markov Model (pHMM), k-mer based methods, and deep learning-based methods do not provide very accurate protein function prediction for proteins in the twilight zone, due to low sequence similarity to reference proteins with known functions. Results We present a novel method EnsembleFam, aiming at better function prediction for proteins in the twilight zone. EnsembleFam extracts the core characteristics of a protein family using similarity and dissimilarity features calculated from sequence homology relations. EnsembleFam trains three separate Support Vector Machine (SVM) classifiers for each family using these features, and an ensemble prediction is made to classify novel proteins into these families. Extensive experiments are conducted using the Clusters of Orthologous Groups (COG) dataset and G Protein-Coupled Receptor (GPCR) dataset. EnsembleFam not only outperforms state-of-the-art methods on the overall dataset but also provides a much more accurate prediction for twilight zone proteins. Conclusions EnsembleFam, a machine learning method to model protein families, can be used to better identify members with very low sequence homology. Using EnsembleFam protein functions can be predicted using just sequence information with better accuracy than state-of-the-art methods.
Collapse
Affiliation(s)
- Mohammad Neamul Kabir
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore.
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
| |
Collapse
|
187
|
Feng Q, Hou M, Liu J, Zhao K, Zhang G. Construct a variable-length fragment library for de novo protein structure prediction. Brief Bioinform 2022; 23:6547572. [PMID: 35284936 DOI: 10.1093/bib/bbac086] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/10/2022] [Accepted: 02/20/2022] [Indexed: 11/12/2022] Open
Abstract
Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.
Collapse
Affiliation(s)
- Qiongqiong Feng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
188
|
Langenfeld F, Aderinwale T, Christoffer C, Shin WH, Terashi G, Wang X, Kihara D, Benhabiles H, Hammoudi K, Cabani A, Windal F, Melkemi M, Otu E, Zwiggelaar R, Hunter D, Liu Y, Sirugue L, Nguyen HNH, Nguyen TDH, Nguyen-Truong VT, Le D, Nguyen HD, Tran MT, Montès M. Surface-based protein domains retrieval methods from a SHREC2021 challenge. J Mol Graph Model 2022; 111:108103. [PMID: 34959149 PMCID: PMC9746607 DOI: 10.1016/j.jmgm.2021.108103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/29/2021] [Accepted: 12/04/2021] [Indexed: 12/15/2022]
Abstract
Proteins are essential to nearly all cellular mechanism and the effectors of the cells activities. As such, they often interact through their surface with other proteins or other cellular ligands such as ions or organic molecules. The evolution generates plenty of different proteins, with unique abilities, but also proteins with related functions hence similar 3D surface properties (shape, physico-chemical properties, …). The protein surfaces are therefore of primary importance for their activity. In the present work, we assess the ability of different methods to detect such similarities based on the geometry of the protein surfaces (described as 3D meshes), using either their shape only, or their shape and the electrostatic potential (a biologically relevant property of proteins surface). Five different groups participated in this contest using the shape-only dataset, and one group extended its pre-existing method to handle the electrostatic potential. Our comparative study reveals both the ability of the methods to detect related proteins and their difficulties to distinguish between highly related proteins. Our study allows also to analyze the putative influence of electrostatic information in addition to the one of protein shapes alone. Finally, the discussion permits to expose the results with respect to ones obtained in the previous contests for the extended method. The source codes of each presented method have been made available online.
Collapse
Affiliation(s)
- Florent Langenfeld
- Laboratoire de Génomique, Bio-informatique et Chimie Moléculaire (GBCM), EA 7528, Conservatoire National des Arts-et-Métiers, HESAM Université, 2, rue Conté, Paris, 75003, France,Corresponding author: (F. Langenfeld)
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Woong-Hee Shin
- Department of Chemical Science Education, Sunchon National University, Suncheon, 57922, Republic of Korea
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA,Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Halim Benhabiles
- Univ. Lille, CNRS, Centrale Lille, Univ. Polytechnique Hauts-de-France, Junia, UMR 8520, IEMN - Institut d’Electronique de Microélectronique et de Nanotechnologie, F-59 000, Lille, France
| | - Karim Hammoudi
- Université de Haute-Alsace, Department of Computer Science, IRIMAS, F-68 100, Mulhouse, France,Université de Strasbourg, France
| | - Adnane Cabani
- Normandie University, UNIROUEN, ESIGELEC, IRSEEM, 76000, Rouen, France
| | - Feryal Windal
- Univ. Lille, CNRS, Centrale Lille, Univ. Polytechnique Hauts-de-France, Junia, UMR 8520, IEMN - Institut d’Electronique de Microélectronique et de Nanotechnologie, F-59 000, Lille, France
| | - Mahmoud Melkemi
- Université de Haute-Alsace, Department of Computer Science, IRIMAS, F-68 100, Mulhouse, France,Université de Strasbourg, France
| | - Ekpo Otu
- Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3FL, UK
| | - Reyer Zwiggelaar
- Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3FL, UK
| | - David Hunter
- Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3FL, UK
| | - Yonghuai Liu
- Department of Computer Science, Edge Hill University, Ormskirk, L39 4QP, UK
| | - Léa Sirugue
- Laboratoire de Génomique, Bio-informatique et Chimie Moléculaire (GBCM), EA 7528, Conservatoire National des Arts-et-Métiers, HESAM Université, 2, rue Conté, Paris, 75003, France
| | - Huu-Nghia H. Nguyen
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | - Tuan-Duy H. Nguyen
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | | | - Danh Le
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | - Hai-Dang Nguyen
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam
| | - Minh-Triet Tran
- University of Science, VNU-HCM, Viet Nam,Vietnam National University, Ho Chi Minh City, Viet Nam,John von Neumann Institute, VNU-HCM, Viet Nam
| | - Matthieu Montès
- Laboratoire de Génomique, Bio-informatique et Chimie Moléculaire (GBCM), EA 7528, Conservatoire National des Arts-et-Métiers, HESAM Université, 2, rue Conté, Paris, 75003, France,Corresponding author: (M. Montès)
| |
Collapse
|
189
|
Thibau A, Hipp K, Vaca DJ, Chowdhury S, Malmström J, Saragliadis A, Ballhorn W, Linke D, Kempf VAJ. Long-Read Sequencing Reveals Genetic Adaptation of Bartonella Adhesin A Among Different Bartonella henselae Isolates. Front Microbiol 2022; 13:838267. [PMID: 35197960 PMCID: PMC8859334 DOI: 10.3389/fmicb.2022.838267] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 01/17/2022] [Indexed: 11/30/2022] Open
Abstract
Bartonella henselae is the causative agent of cat scratch disease and other clinical entities such as endocarditis and bacillary angiomatosis. The life cycle of this pathogen, with alternating host conditions, drives evolutionary and host-specific adaptations. Human, feline, and laboratory adapted B. henselae isolates often display genomic and phenotypic differences that are related to the expression of outer membrane proteins, for example the Bartonella adhesin A (BadA). This modularly-structured trimeric autotransporter adhesin is a major virulence factor of B. henselae and is crucial for the initial binding to the host via the extracellular matrix proteins fibronectin and collagen. By using next-generation long-read sequencing we demonstrate a conserved genome among eight B. henselae isolates and identify a variable genomic badA island with a diversified and highly repetitive badA gene flanked by badA pseudogenes. Two of the eight tested B. henselae strains lack BadA expression because of frameshift mutations. We suggest that active recombination mechanisms, possibly via phase variation (i.e., slipped-strand mispairing and site-specific recombination) within the repetitive badA island facilitate reshuffling of homologous domain arrays. The resulting variations among the different BadA proteins might contribute to host immune evasion and enhance long-term and efficient colonisation in the differing host environments. Considering the role of BadA as a key virulence factor, it remains important to check consistently and regularly for BadA surface expression during experimental infection procedures.
Collapse
Affiliation(s)
- Arno Thibau
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt am Main, Germany
| | - Katharina Hipp
- Electron Microscopy Facility, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Diana J Vaca
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt am Main, Germany
| | - Sounak Chowdhury
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Johan Malmström
- Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Athanasios Saragliadis
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Wibke Ballhorn
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt am Main, Germany
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Volkhard A J Kempf
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt am Main, Germany
| |
Collapse
|
190
|
Delerue T, Anantharaman V, Gilmore MC, Popham DL, Cava F, Aravind L, Ramamurthi KS. Bacterial developmental checkpoint that directly monitors cell surface morphogenesis. Dev Cell 2022; 57:344-360.e6. [PMID: 35065768 PMCID: PMC8991396 DOI: 10.1016/j.devcel.2021.12.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 11/15/2021] [Accepted: 12/20/2021] [Indexed: 01/05/2023]
Abstract
Bacillus subtilis spores are encased in two concentric shells: an outer proteinaceous "coat" and an inner peptidoglycan "cortex," separated by a membrane. Cortex assembly depends on coat assembly initiation, but how cells achieve this coordination across the membrane is unclear. Here, we report that the protein SpoVID monitors the polymerization state of the coat basement layer via an extension to a functional intracellular LysM domain that arrests sporulation when coat assembly is initiated improperly. Whereas extracellular LysM domains bind mature peptidoglycan, SpoVID LysM binds to the membrane-bound lipid II peptidoglycan precursor. We propose that improper coat assembly exposes the SpoVID LysM domain, which then sequesters lipid II and prevents cortex assembly. SpoVID defines a widespread group of firmicute proteins with a characteristic N-terminal domain and C-terminal peptidoglycan-binding domains that might combine coat and cortex assembly roles to mediate a developmental checkpoint linking the morphogenesis of two spatially separated supramolecular structures.
Collapse
Affiliation(s)
- Thomas Delerue
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Vivek Anantharaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael C. Gilmore
- Laboratory for Molecular Infection Medicine Sweden (MIMS), Department of Molecular Biology, Umeå University, 90187 Umeå, Sweden
| | - David L. Popham
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Felipe Cava
- Laboratory for Molecular Infection Medicine Sweden (MIMS), Department of Molecular Biology, Umeå University, 90187 Umeå, Sweden
| | - L. Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kumaran S. Ramamurthi
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA,Lead contact,Correspondence:
| |
Collapse
|
191
|
Kong L, Ju F, Zheng WM, Zhu J, Sun S, Xu J, Bu D. ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs. J Comput Biol 2022; 29:92-105. [PMID: 35073170 PMCID: PMC8892980 DOI: 10.1089/cmb.2021.0430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build a structure model according to the alignment. Tested on three independent data sets with a total of 6688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods, including HHpred, CNFpred, CEthreader, and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- Toyota Technological Institute, Chicago, Illinois, USA
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei-mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | | | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute, Chicago, Illinois, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
192
|
Catalytic trajectory of a dimeric nonribosomal peptide synthetase subunit with an inserted epimerase domain. Nat Commun 2022; 13:592. [PMID: 35105906 PMCID: PMC8807600 DOI: 10.1038/s41467-022-28284-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 01/04/2022] [Indexed: 11/16/2022] Open
Abstract
Nonribosomal peptide synthetases (NRPSs) are modular assembly-line megaenzymes that synthesize diverse metabolites with wide-ranging biological activities. The structural dynamics of synthetic elongation has remained unclear. Here, we present cryo-EM structures of PchE, an NRPS elongation module, in distinct conformations. The domain organization reveals a unique “H”-shaped head-to-tail dimeric architecture. The capture of both aryl and peptidyl carrier protein-tethered substrates and intermediates inside the heterocyclization domain and l-cysteinyl adenylate in the adenylation domain illustrates the catalytic and recognition residues. The multilevel structural transitions guided by the adenylation C-terminal subdomain in combination with the inserted epimerase and the conformational changes of the heterocyclization tunnel are controlled by two residues. Moreover, we visualized the direct structural dynamics of the full catalytic cycle from thiolation to epimerization. This study establishes the catalytic trajectory of PchE and sheds light on the rational re-engineering of domain-inserted dimeric NRPSs for the production of novel pharmaceutical agents. The catalytic domains in nonribosomal peptide synthetases (NRPSs) are responsible for a choreography of events that elongates substrates into natural products. Here, the authors present cryo-EM structures of a siderophore-producing dimeric NRPS elongation module in multiple distinct conformations, which provides insight into the mechanisms of catalytic trajectory.
Collapse
|
193
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
194
|
Li G, Phetsanthad A, Ma M, Yu Q, Nair A, Zheng Z, Ma F, DeLaney K, Hong S, Li L. Native Ion Mobility-Mass Spectrometry-Enabled Fast Structural Interrogation of Labile Protein Surface Modifications at the Intact Protein Level. Anal Chem 2022; 94:2142-2153. [PMID: 35050568 DOI: 10.1021/acs.analchem.1c04503] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Protein sialylation has been closely linked to many diseases including Alzheimer's disease (AD). It is also broadly implicated in therapeutics operating in a pattern-dependent (e.g., Neu5Ac vs Neu5Gc) manner. However, how the sialylation pattern affects the AD-associated, transferrin-assisted iron/Aβ cellular uptake process remains largely ill-defined. Herein, we report the use of native ion mobility-mass spectrometry (IM-MS)-based fast structural probing methodology, enabling well-controlled, synergistic, and in situ manipulation of mature glycoproteins and attached sialic acids. IM-MS-centered experiments enable the combinatorial interrogation of sialylation effects on Aβ cytotoxicity and the chemical, conformational, and topological stabilities of transferrin. Cell viability experiments suggest that Neu5Gc replacement enhances the transferrin-assisted, iron loading-associated Aβ cytotoxicity. Native gel electrophoresis and IM-MS reveal that sialylation stabilizes transferrin conformation but inhibits its dimerization. Collectively, IM-MS is adapted to capture key sialylation intermediates involved in fine-tuning AD-associated glycoprotein structural microheterogeneity. Our results provide the molecular basis for the importance of sustaining moderate TF sialylation levels, especially Neu5Ac, in promoting iron cellular transportation and rescuing iron-enhanced Aβ cytotoxicity.
Collapse
Affiliation(s)
- Gongyu Li
- Research Center for Analytical Science and Tianjin Key Laboratory of Biosensing and Molecular Recognition, College of Chemistry, Nankai University, Tianjin 300071, China
| | | | | | | | | | - Zhen Zheng
- School of Pharmacy, Tianjin Medical University, Tianjin 300070, China
| | - Fengfei Ma
- Protein Sciences, Discovery Biologics, Merck & Co., Inc., South San Francisco, California 94080, United States
| | | | | | | |
Collapse
|
195
|
Johnson AG, Wein T, Mayer ML, Duncan-Lowey B, Yirmiya E, Oppenheimer-Shaanan Y, Amitai G, Sorek R, Kranzusch PJ. Bacterial gasdermins reveal an ancient mechanism of cell death. Science 2022; 375:221-225. [PMID: 35025633 DOI: 10.1126/science.abj8432] [Citation(s) in RCA: 164] [Impact Index Per Article: 54.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
[Figure: see text].
Collapse
Affiliation(s)
- Alex G Johnson
- Department of Microbiology, Harvard Medical School, Boston, MA 02115, USA.,Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Tanita Wein
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Megan L Mayer
- Harvard Center for Cryo-Electron Microscopy, Harvard Medical School, Boston, MA 02115, USA
| | - Brianna Duncan-Lowey
- Department of Microbiology, Harvard Medical School, Boston, MA 02115, USA.,Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Erez Yirmiya
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | - Gil Amitai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Rotem Sorek
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Philip J Kranzusch
- Department of Microbiology, Harvard Medical School, Boston, MA 02115, USA.,Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02115, USA.,Parker Institute for Cancer Immunotherapy, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| |
Collapse
|
196
|
Leger MM, Ros-Rocher N, Najle SR, Ruiz-Trillo I. Rel/NF-κB Transcription Factors Emerged at the Onset of Opisthokonts. Genome Biol Evol 2022; 14:6499270. [PMID: 34999783 PMCID: PMC8763368 DOI: 10.1093/gbe/evab289] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/27/2021] [Indexed: 12/23/2022] Open
Abstract
The Rel/NF-κB transcription factor family has myriad roles in immunity, development, and differentiation in animals, and was considered a key innovation for animal multicellularity. Rel homology domain-containing proteins were previously hypothesized to have originated in a last common ancestor of animals and some of their closest unicellular relatives. However, key taxa were missing from previous analyses, necessitating a systematic investigation into the distribution and evolution of these proteins. Here, we address this knowledge gap by surveying taxonomically broad data from eukaryotes, with a special emphasis on lineages closely related to animals. We report an earlier origin for Rel/NF-κB proteins than previously described, in the last common ancestor of animals and fungi, and show that even in the sister group to fungi, these proteins contain elements that in animals are necessary for the subcellular regulation of Rel/NF-κB.
Collapse
Affiliation(s)
- Michelle M Leger
- Institute of Evolutionary Biology (Consejo Superior de Investigaciones Científicas-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Núria Ros-Rocher
- Institute of Evolutionary Biology (Consejo Superior de Investigaciones Científicas-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Sebastián R Najle
- Institute of Evolutionary Biology (Consejo Superior de Investigaciones Científicas-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain
| | - Iñaki Ruiz-Trillo
- Institute of Evolutionary Biology (Consejo Superior de Investigaciones Científicas-Universitat Pompeu Fabra), Barcelona, Catalonia, Spain.,Department of Genetics, Microbiology and Statistics, Institute for Research on Biodiversity, University of Barcelona, Catalonia, Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Catalonia, Spain
| |
Collapse
|
197
|
Yadav NS, Kumar P, Singh I. Structural and functional analysis of protein. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00026-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
198
|
Kumar G, Srinivasan N, Sandhya S. Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection. Methods Mol Biol 2022; 2449:149-167. [PMID: 35507261 DOI: 10.1007/978-1-0716-2095-3_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Sequence-based approaches are fundamental to guide experimental investigations in obtaining structural and/or functional insights into uncharacterized protein families. Powerful profile-based sequence search methods rely on a sequence space continuum to identify non-trivial relationships through homology detection. The computational design of protein-like sequences that serve as "artificial linkers" is useful in identifying relationships between distant members of a structural fold. Such sequences act as intermediates and guide homology searches between distantly related proteins. Here, we describe an approach that represents natural intermediate sequences and designed protein-like sequences as HMM (Hidden Markov Models) profiles, to improve the sensitivity of existing search methods. Searches made within the "Profile database" were shown to recognize the parent structural fold for 90% of the search queries at query coverage better than 60%. For 1040 protein families with no available structure, fold associations were made through searches in the database of natural and designed sequence profiles. Most of the associations were made with the Alpha-alpha superhelix, Transmembrane beta-barrels, TIM barrel, and Immunoglobulin-like beta-sandwich folds. For 11 domain families of unknown functions, we provide confident fold associations using the profiles of designed sequences and a consensus from other fold recognition methods. For two DUFs (Domain families of Unknown Functions), we performed detailed functional annotation through comparisons with characterized templates of families of known function.
Collapse
Affiliation(s)
- Gayatri Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | | - Sankaran Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India.
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India.
| |
Collapse
|
199
|
Predicting the capsid architecture of phages from metagenomic data. Comput Struct Biotechnol J 2022; 20:721-732. [PMID: 35140890 PMCID: PMC8814770 DOI: 10.1016/j.csbj.2021.12.032] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/29/2022] Open
Abstract
Tailed phages are viruses that infect bacteria and are the most abundant biological entities on Earth. Their ecological, evolutionary, and biogeochemical roles in the planet stem from their genomic diversity. Known tailed phage genomes range from 10 to 735 kilobase pairs thanks to the size variability of the protective protein capsids that store them. However, the role of tailed phage capsids’ diversity in ecosystems is unclear. A fundamental gap is the difficulty of associating genomic information with viral capsids in the environment. To address this problem, here, we introduce a computational approach to predict the capsid architecture (T-number) of tailed phages using the sequence of a single gene—the major capsid protein. This approach relies on an allometric model that relates the genome length and capsid architecture of tailed phages. This allometric model was applied to isolated phage genomes to generate a library that associated major capsid proteins and putative capsid architectures. This library was used to train machine learning methods, and the most computationally scalable model investigated (random forest) was applied to human gut metagenomes. Compared to isolated phages, the analysis of gut data reveals a large abundance of mid-sized (T = 7) capsids, as expected, followed by a relatively large frequency of jumbo-like tailed phage capsids (T ≥ 25) and small capsids (T = 4) that have been under-sampled. We discussed how to increase the method’s accuracy and how to extend the approach to other viruses. The computational pipeline introduced here opens the doors to monitor the ongoing evolution and selection of viral capsids across ecosystems.
Collapse
|
200
|
Masamba P, Weber BW, Sewell BT, Kappo AP. Crystallization and preliminary structural determination of the universal stress G4LZI3 protein from Schistosoma mansoni. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022] Open
|