1
|
Parra RG, Komives EA, Wolynes PG, Ferreiro DU. Frustration in physiology and molecular medicine. Mol Aspects Med 2025; 103:101362. [PMID: 40273505 DOI: 10.1016/j.mam.2025.101362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Revised: 03/25/2025] [Accepted: 03/27/2025] [Indexed: 04/26/2025]
Abstract
Molecules provide the ultimate language in terms of which physiology and pathology must be understood. Myriads of proteins participate in elaborate networks of interactions and perform chemical activities coordinating the life of cells. To perform these often amazing tasks, proteins must move and we must think of them as dynamic ensembles of three dimensional structures formed first by folding the polypeptide chains so as to minimize the conflicts between the interactions of their constituent amino acids. It is apparent however that, even when completely folded, not all conflicting interactions have been resolved so the structure remains 'locally frustrated'. Over the last decades it has become clearer that this local frustration is not just a random accident but plays an essential part of the inner workings of protein molecules. We will review here the physical origins of the frustration concept and review evidence that local frustration is important for protein physiology, protein-protein recognition, catalysis and allostery. Also, we highlight examples showing how alterations in the local frustration patterns can be linked to distinct pathologies. Finally we explore the extensions of the impact of frustration in higher order levels of organization of systems including gene regulatory networks and the neural networks of the brain.
Collapse
Affiliation(s)
- R Gonzalo Parra
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | | | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA.
| | - Diego U Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina; Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina.
| |
Collapse
|
2
|
Paul M, Banerjee A, Maiti S, Mitra D, DasMohapatra PK, Thatoi H. Evaluation of substrate specificity and catalytic promiscuity of Bacillus albus cellulase: an insight into in silico proteomic study aiming at enhanced production of renewable energy. J Biomol Struct Dyn 2025; 43:3076-3098. [PMID: 38126200 DOI: 10.1080/07391102.2023.2295971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
Cellulases are enzymes that aid in the hydrolysis of cellulosic fibers and have a wide range of industrial uses. In the present in silico study, sequence alignment between cellulases from different Bacillus species revealed that most of the residues are conserved in those aligned enzymes. Three dimensional structures of cellulase enzymes from 23 different Bacillus species have been predicted and based on the alignment between the modeled structures, those enzymes have been categorized into 7 different groups according to the homology in their conformational folds. There are two structural contents in Gr-I cellulase namely β1-α2 and β3-α5 loops which varies greatly according to their static position. Molecular docking study between the B. albus cellulase and its various cellulosic substrates including xylanoglucan oligosaccharides revealed that residues viz. Phe154, Tyr258, Tyr282, Tyr285, and Tyr376 of B. albus cellulase are significantly involved in formation stacking interaction during enzyme-substrate binding. Residue interaction network and binding energy analysis for the B. albus cellulase with different cellulosic substrates depicted the strong affinity of XylGlc3 substrate with the receptor enzyme. Molecular interaction and molecular dynamics simulation studies exhibited structural stability of enzyme-substrate complexes which are greatly influenced by the presence of catalytic promiscuity in their substrate binding sites. Screening of B. albus in carboxymethylcellulose (CMC) and xylan supplemented agar media revealed the capability of the bacterium in degrading both cellulose and xylan. Overall, the study demonstrated B. albus cellulase as an effective biocatalyst candidate with the potential role of catalytic promiscuity for possible applications in biofuel industries.
Collapse
Affiliation(s)
- Manish Paul
- Department of Biotechnology, Maharaja Sriram Chandra Bhanja Deo University, Baripada, India
- Microbiology and Immunology, University of California San Francisco, San Francisco, CA, USA
| | - Amrita Banerjee
- Oriental Institute of Science and Technology, Midnapore, India
| | - Smarajit Maiti
- Oriental Institute of Science and Technology, Midnapore, India
| | - Debanjan Mitra
- Department of Microbiology, Raiganj University, Raiganj, India
| | - Pradeep K DasMohapatra
- Department of Microbiology, Raiganj University, Raiganj, India
- PAKB Environment Conservation Centre, Raiganj University, Raiganj, India
| | - Hrudayanath Thatoi
- Department of Biotechnology, Maharaja Sriram Chandra Bhanja Deo University, Baripada, India
| |
Collapse
|
3
|
Choopanian P, Andressoo JO, Mirzaie M. A fast approach for structural and evolutionary analysis based on energetic profile protein comparison. Nat Commun 2025; 16:2231. [PMID: 40044697 PMCID: PMC11882786 DOI: 10.1038/s41467-025-57374-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 02/14/2025] [Indexed: 03/09/2025] Open
Abstract
In structural bioinformatics, the efficiency of predicting protein similarity, function, and evolutionary relationships is crucial. Our approach proposed herein leverages protein energy profiles derived from a knowledge-based potential, deviating from traditional methods relying on structural alignment or atomic distances. This method assigns unique energy profiles to individual proteins, facilitating rapid comparative analysis for both structural similarities and evolutionary relationships across various hierarchical levels. Our study demonstrates that energy profiles contain substantial information about protein structure at class, fold, superfamily, and family levels. Notably, these profiles accurately distinguish proteins across species, illustrated by the classification of coronavirus spike glycoproteins and bacteriocin proteins. Introducing a separation measure based on energy profile similarity, our method shows significant correlation with a network-based approach, emphasizing the potential of energy profiles as efficient predictors for drug combinations with faster computational requirements. Our key insight is that the sequence-based energy profile strongly correlates with structure-derived energy, enabling rapid and efficient protein comparisons based solely on sequences.
Collapse
Affiliation(s)
- Peyman Choopanian
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jaan-Olle Andressoo
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Division of Neurogeriatrics, Department of Neurobiology, Care Sciences and Society (NVS), Karolinska Institutet, Stockholm, Sweden.
| | - Mehdi Mirzaie
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
4
|
Parra RG, Komives EA, Wolynes PG, Ferreiro DU. Frustration In Physiology And Molecular Medicine. ARXIV 2025:arXiv:2502.03851v1. [PMID: 39975445 PMCID: PMC11838788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Molecules provide the ultimate language in terms of which physiology and pathology must be understood. Myriads of proteins participate in elaborate networks of interactions and perform chemical activities coordinating the life of cells. To perform these often amazing tasks, proteins must move and we must think of them as dynamic ensembles of three dimensional structures formed first by folding the polypeptide chains so as to minimize the conflicts between the interactions of their constituent amino acids. It is apparent however that, even when completely folded, not all conflicting interactions have been resolved so the structure remains 'locally frustrated'. Over the last decades it has become clearer that this local frustration is not just a random accident but plays an essential part of the inner workings of protein molecules. We will review here the physical origins of the frustration concept and review evidence that local frustration is important for protein physiology, protein-protein recognition, catalysis and allostery. Also, we highlight examples showing how alterations in the local frustration patterns can be linked to distinct pathologies. Finally we explore the extensions of the impact of frustration in higher order levels of organization of systems including gene regulatory networks and the neural networks of the brain.
Collapse
Affiliation(s)
- R. Gonzalo Parra
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | | | - Peter G. Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Consejo Nacional de Investigaciones Científicas y Técnicas - Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina
| |
Collapse
|
5
|
Chowdhury S, Fong SS, Uetz P. The protein interactome of Escherichia coli carbohydrate metabolism. PLoS One 2025; 20:e0315240. [PMID: 39903745 PMCID: PMC11793828 DOI: 10.1371/journal.pone.0315240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 11/21/2024] [Indexed: 02/06/2025] Open
Abstract
We investigate how protein-protein interactions (PPIs) can regulate carbohydrate metabolism in Escherichia coli. We specifically investigated the stoichiometry of 378 PPIs involving carbohydrate metabolic enzymes. In 48 interactions, the interactors were much more abundant than the enzyme and are thus likely to affect enzyme activity and carbohydrate metabolism. Many of these PPIs are conserved across thousands of bacteria including pathogens and microbial species. E. coli adapts to different cellular environments by adjusting the quantities of the interacting proteins (25 PPIs) in a way that the protein-enzyme interaction (PEI) is a likely mechanism to regulate its metabolism in specific environments. We predict 3 PPIs (RpsB-AdhE, DcyD-NanE and MinE-Yccx) previously not known to regulate metabolism.
Collapse
Affiliation(s)
- Shomeek Chowdhury
- Center for Integrative Life Sciences Education, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Stephen S. Fong
- Center for Integrative Life Sciences Education, Virginia Commonwealth University, Richmond, VA, United States of America
| | - Peter Uetz
- Center for Biological Data Science, School of Life Sciences, Virginia Commonwealth University, Richmond, VA, United States of America
| |
Collapse
|
6
|
Cocco S, Posani L, Monasson R. Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information. Proc Natl Acad Sci U S A 2024; 121:e2312335121. [PMID: 38889151 PMCID: PMC11214004 DOI: 10.1073/pnas.2312335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/21/2024] [Indexed: 06/20/2024] Open
Abstract
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Lorenzo Posani
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| |
Collapse
|
7
|
Freiberger MI, Ruiz-Serra V, Pontes C, Romero-Durana M, Galaz-Davison P, Ramírez-Sarmiento CA, Schuster CD, Marti MA, Wolynes PG, Ferreiro DU, Parra RG, Valencia A. Local energetic frustration conservation in protein families and superfamilies. Nat Commun 2023; 14:8379. [PMID: 38104123 PMCID: PMC10725452 DOI: 10.1038/s41467-023-43801-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 11/20/2023] [Indexed: 12/19/2023] Open
Abstract
Energetic local frustration offers a biophysical perspective to interpret the effects of sequence variability on protein families. Here we present a methodology to analyze local frustration patterns within protein families and superfamilies that allows us to uncover constraints related to stability and function, and identify differential frustration patterns in families with a common ancestry. We analyze these signals in very well studied protein families such as PDZ, SH3, ɑ and β globins and RAS families. Recent advances in protein structure prediction make it possible to analyze a vast majority of the protein space. An automatic and unsupervised proteome-wide analysis on the SARS-CoV-2 virus demonstrates the potential of our approach to enhance our understanding of the natural phenotypic diversity of protein families beyond single protein instances. We apply our method to modify biophysical properties of natural proteins based on their family properties, as well as perform unsupervised analysis of large datasets to shed light on the physicochemical signatures of poorly characterized proteins such as the ones belonging to emergent pathogens.
Collapse
Affiliation(s)
- Maria I Freiberger
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, C1428EGA, Argentina
| | - Victoria Ruiz-Serra
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Camila Pontes
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Miguel Romero-Durana
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Pablo Galaz-Davison
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine, and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, 7820436, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Santiago, 8331150, Chile
| | - Cesar A Ramírez-Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine, and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, 7820436, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Santiago, 8331150, Chile
| | - Claudio D Schuster
- Laboratorio de Bioinformática, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Buenos Aires, Argentina
| | - Marcelo A Marti
- Laboratorio de Bioinformática, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Buenos Aires, Argentina
| | - Peter G Wolynes
- Center for Theoretical Biological Physics and Department of Chemistry, Rice University, Houston, TX, 77005, USA
| | - Diego U Ferreiro
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, C1428EGA, Argentina
| | - R Gonzalo Parra
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain.
| | - Alfonso Valencia
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
8
|
Tu W, Zheng C, Zheng Y, Feng Z, Lin H, Jiang Y, Chen W, Chen Y, Lee Y, Su J, Zheng W. The investigation of interaction and chaperon-like activity of α-synuclein as a protein in pathophysiology of Parkinson's disease upon direct interaction with tectorigenin. Int J Biol Macromol 2023; 249:125702. [PMID: 37414324 DOI: 10.1016/j.ijbiomac.2023.125702] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 07/02/2023] [Accepted: 07/03/2023] [Indexed: 07/08/2023]
Abstract
Analyzing the therapeutic potential of a therapeutic biomolecule requires an understanding of how it may interact with proteins and modify their corresponding functions. α-Synuclein is a protein which is widely involved in the pathogenesis of Parkinson's disease (PD) and shows chaperon-like activity. We have selected tectorigenin, a most common methoxyisoflavone extracted from plants, among therapeutic bioactive molecules that are documented to have different therapeutic effects. Herein, we aimed to explore how tectorigenin interacts with α-synuclein in vitro by mimicking the physiological environment. Spectroscopic as well as theoretical studies including molecular docking simulation, were used to examine the effects of tectorigenin on the conformation and dynamics of α-synuclein. It was shown that tectorigenin is able to quench the protein emission spectra relied on a mixed static-dynamic quenching mechanism. Furthermore, it was displayed that tectorigenin binding to α-synuclein leads to microenvironmental changes in the tertiary structure of protein, however the protein's secondary structure was almost unchanged. It was also deduced that tectorigenin results in thermal stability of α-synuclein structure, evidenced by less perturbation of α-synuclein secondary structure following elevation of temperature in the presence of tectorigenin relative to that of free form. Molecular docking simulation demonstrated that non-covalent reactions, mainly hydrogen bonds, had a key role in the interaction and stabilization of α-synuclein in the presence of tectorigenin. Moreover, chaperon-like activity of α-synuclein was improved in the presence of tectorigenin against two model proteins, βL-crystallin and catalase. The findings showed that tectorigenin can lead to stabilization of α-synuclein, which may be used as a therapeutic agent in prevention of neurodegenerative diseases.
Collapse
Affiliation(s)
- Wenzhan Tu
- Rehabilitation Medicine Center, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang 325027, China; Integrative & Optimized Medicine Research Center, China-USA Institute for Acupuncture and Rehabilitation, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China
| | - Cheng Zheng
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yuyin Zheng
- Rehabilitation Medicine Center, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang 325027, China; Integrative & Optimized Medicine Research Center, China-USA Institute for Acupuncture and Rehabilitation, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China
| | - Zhenhua Feng
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Haiyan Lin
- Rehabilitation Medicine Center, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang 325027, China; Integrative & Optimized Medicine Research Center, China-USA Institute for Acupuncture and Rehabilitation, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China
| | - Yiwei Jiang
- Alberta Institute, Wenzhou Medical University, Wenzhou 325000, China
| | - WangChao Chen
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yuhan Chen
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
| | - Yang Lee
- Second affiliation of Nanjing Medical University, Nanjing, Jiangsu Province, China
| | - Jianzhong Su
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China.
| | - Wu Zheng
- Rehabilitation Medicine Center, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang 325027, China; Integrative & Optimized Medicine Research Center, China-USA Institute for Acupuncture and Rehabilitation, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China.
| |
Collapse
|
9
|
Dixit R, Khambhati K, Supraja KV, Singh V, Lederer F, Show PL, Awasthi MK, Sharma A, Jain R. Application of machine learning on understanding biomolecule interactions in cellular machinery. BIORESOURCE TECHNOLOGY 2023; 370:128522. [PMID: 36565819 DOI: 10.1016/j.biortech.2022.128522] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Machine learning (ML) applications have become ubiquitous in all fields of research including protein science and engineering. Apart from protein structure and mutation prediction, scientists are focusing on knowledge gaps with respect to the molecular mechanisms involved in protein binding and interactions with other components in the experimental setups or the human body. Researchers are working on several wet-lab techniques and generating data for a better understanding of concepts and mechanics involved. The information like biomolecular structure, binding affinities, structure fluctuations and movements are enormous which can be handled and analyzed by ML. Therefore, this review highlights the significance of ML in understanding the biomolecular interactions while assisting in various fields of research such as drug discovery, nanomedicine, nanotoxicity and material science. Hence, the way ahead would be to force hand-in hand of laboratory work and computational techniques.
Collapse
Affiliation(s)
- Rewati Dixit
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Khushal Khambhati
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Kolli Venkata Supraja
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Vijai Singh
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Franziska Lederer
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany
| | - Pau-Loke Show
- Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; Department of Sustainable Engineering, Saveetha School of Engineering, SIMATS, Chennai 602105, India; Department of Chemical and Environmental Engineering, University of Nottingham, Malaysia, 43500 Semenyih, Selangor Darul Ehsan, Malaysia
| | - Mukesh Kumar Awasthi
- College of Natural Resources and Environment, Northwest A&F University, Yangling 712100, China
| | - Abhinav Sharma
- Institute Theory of Polymers, Leibniz Institute for Polymer Research, Hohe Strasse 6, 01069 Dresden, Germany
| | - Rohan Jain
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany.
| |
Collapse
|
10
|
Orouji E, Raman AT. Computational methods to explore chromatin state dynamics. Brief Bioinform 2022; 23:6751148. [PMID: 36208178 PMCID: PMC9677473 DOI: 10.1093/bib/bbac439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/25/2022] [Accepted: 09/09/2022] [Indexed: 12/14/2022] Open
Abstract
The human genome is marked by several singular and combinatorial histone modifications that shape the different states of chromatin and its three-dimensional organization. Genome-wide mapping of these marks as well as histone variants and open chromatin regions is commonly carried out via profiling DNA-protein binding or via chromatin accessibility methods. After the generation of epigenomic datasets in a cell type, statistical models can be used to annotate the noncoding regions of DNA and infer the combinatorial histone marks or chromatin states (CS). These methods involve partitioning the genome and labeling individual segments based on their CS patterns. Chromatin labels enable the systematic discovery of genomic function and activity and can label the gene body, promoters or enhancers without using other genomic maps. CSs are dynamic and change under different cell conditions, such as in normal, preneoplastic or tumor cells. This review aims to explore the available computational tools that have been developed to capture CS alterations under two or more cellular conditions.
Collapse
Affiliation(s)
- Elias Orouji
- Corresponding author: Elias Orouji, Epigenomics Lab, Princess Margaret Cancer Centre, University Health Network (UHN), 101 College St., Toronto, ON M5G 1 L7, Canada. Tel: +1 (917) 647-2202; E-mail:
| | - Ayush T Raman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts, USA
| |
Collapse
|
11
|
Calmodulin in Paramecium: Focus on Genomic Data. Microorganisms 2022; 10:microorganisms10101915. [PMID: 36296191 PMCID: PMC9608856 DOI: 10.3390/microorganisms10101915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 09/14/2022] [Accepted: 09/22/2022] [Indexed: 11/26/2022] Open
Abstract
Calcium (Ca2+) is a universal second messenger that plays a key role in cellular signaling. However, Ca2+ signals are transduced with the help of Ca2+-binding proteins, which serve as sensors, transducers, and elicitors. Among the collection of these Ca2+-binding proteins, calmodulin (CaM) emerged as the prototypical model in eukaryotic cells. This is a small protein that binds four Ca2+ ions and whose functions are multiple, controlling many essential aspects of cell physiology. CaM is universally distributed in eukaryotes, from multicellular organisms, such as human and land plants, to unicellular microorganisms, such as yeasts and ciliates. Here, we review most of the information gathered on CaM in Paramecium, a group of ciliates. We condense the information here by mentioning that mature Paramecium CaM is a 148 amino acid-long protein codified by a single gene, as in other eukaryotic microorganisms. In these ciliates, the protein is notoriously localized and regulates cilia function and can stimulate the activity of some enzymes. When Paramecium CaM is mutated, cells show flawed locomotion and/or exocytosis. We further widen this and additional information in the text, focusing on genomic data.
Collapse
|
12
|
David S, Dorado G, Duarte EL, David-Bosne S, Trigueiro-Louro J, Rebelo-de-Andrade H. COVID-19: impact on Public Health and hypothesis-driven investigations on genetic susceptibility and severity. Immunogenetics 2022; 74:381-407. [PMID: 35348847 PMCID: PMC8961091 DOI: 10.1007/s00251-022-01261-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/14/2022] [Indexed: 12/12/2022]
Abstract
COVID-19 is a new complex multisystem disease caused by the novel coronavirus SARS-CoV-2. In slightly over 2 years, it infected nearly 500 million and killed 6 million human beings worldwide, causing an unprecedented coronavirus pandemic. Currently, the international scientific community is engaged in elucidating the molecular mechanisms of the pathophysiology of SARS-CoV-2 infection as a basis of scientific developments for the future control of COVID-19. Global exome and genome analysis efforts work to define the human genetics of protective immunity to SARS-CoV-2 infection. Here, we review the current knowledge regarding the SARS-CoV-2 infection, the implications of COVID-19 to Public Health and discuss genotype to phenotype association approaches that could be exploited through the selection of candidate genes to identify the genetic determinants of severe COVID-19.
Collapse
Affiliation(s)
- Susana David
- Departamento de Genética Humana, Instituto Nacional de Saúde Doutor Ricardo Jorge (INSA,IP), Lisboa, Portugal.
- Instituto de Investigação do Medicamento (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Lisboa, Portugal.
| | - Guillermo Dorado
- Atlántida Centro de Investigación y Desarrollo de Estudios Profesionales (CIDEP), Granada, Spain
| | - Elsa L Duarte
- MED-Instituto Mediterrâneo para a Agricultura, Ambiente e Desenvolvimento, Escola de Ciências e Tecnologia, Universidade de Évora, Évora, Portugal
| | | | - João Trigueiro-Louro
- Departamento de Doenças Infeciosas, INSA, IP, Lisboa, Portugal
- Host-Pathogen Interaction Unit, Instituto de Investigação do Medicamento (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Lisboa, Portugal
- Hospital Egas Moniz, Centro Hospitalar Lisboa Ocidental, Lisboa, Portugal
| | - Helena Rebelo-de-Andrade
- Departamento de Doenças Infeciosas, INSA, IP, Lisboa, Portugal
- Host-Pathogen Interaction Unit, Instituto de Investigação do Medicamento (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
13
|
Aptekmann AA, Buongiorno J, Giovannelli D, Glamoclija M, Ferreiro DU, Bromberg Y. mebipred: identifying metal binding potential in protein sequence. Bioinformatics 2022; 38:3532-3540. [PMID: 35639953 PMCID: PMC9272798 DOI: 10.1093/bioinformatics/btac358] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 03/27/2022] [Accepted: 05/22/2022] [Indexed: 11/23/2022] Open
Abstract
Motivation metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability. Results we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred’s utility in analyzing microbiome metal requirements. Availability and implementation mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- A A Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ, 08873, USA.,Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA
| | | | - D Giovannelli
- Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.,Department of Biology, University of Naples Federico II, Naples, Italy.,Institute for Marine Biological Resources and Biotechnology-IRBIM, National Research Council of Italy, CNR, Ancona, Italy
| | - M Glamoclija
- Department of Earth and Environmental Sciences, Rutgers University, New Brunswick, NJ, 07102, USA
| | - D U Ferreiro
- Protein Physiology Lab, Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, 1428, Argentina
| | - Y Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ, 08873, USA
| |
Collapse
|
14
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
15
|
Lautens MJ, Tan JH, Serrat X, Del Borrello S, Schertzberg MR, Fraser AG. Identification of enzymes that have helminth-specific active sites and are required for Rhodoquinone-dependent metabolism as targets for new anthelmintics. PLoS Negl Trop Dis 2021; 15:e0009991. [PMID: 34843467 PMCID: PMC8659336 DOI: 10.1371/journal.pntd.0009991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 12/09/2021] [Accepted: 11/11/2021] [Indexed: 11/18/2022] Open
Abstract
Soil transmitted helminths (STHs) are major human pathogens that infect over a billion people. Resistance to current anthelmintics is rising and new drugs are needed. Here we combine multiple approaches to find druggable targets in the anaerobic metabolic pathways STHs need to survive in their mammalian host. These require rhodoquinone (RQ), an electron carrier used by STHs and not their hosts. We identified 25 genes predicted to act in RQ-dependent metabolism including sensing hypoxia and RQ synthesis and found 9 are required. Since all 9 have mammalian orthologues, we used comparative genomics and structural modeling to identify those with active sites that differ between host and parasite. Together, we found 4 genes that are required for RQ-dependent metabolism and have different active sites. Finding these high confidence targets can open up in silico screens to identify species selective inhibitors of these enzymes as new anthelmintics.
Collapse
Affiliation(s)
- Margot J. Lautens
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - June H. Tan
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Xènia Serrat
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | | | | | - Andrew G. Fraser
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
16
|
Prediction of Metal Ion Binding Sites of Transmembrane Proteins. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:2327832. [PMID: 34721655 PMCID: PMC8556105 DOI: 10.1155/2021/2327832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 10/01/2021] [Indexed: 12/22/2022]
Abstract
The metal ion binding of transmembrane proteins (TMPs) plays a fundamental role in biological processes, pharmaceutics, and medicine, but it is hard to extract enough TMP structures in experimental techniques to discover their binding mechanism comprehensively. To predict the metal ion binding sites for TMPs on a large scale, we present a simple and effective two-stage prediction method TMP-MIBS, to identify the corresponding binding residues using TMP sequences. At present, there is no specific research on the metal ion binding prediction of TMPs. Thereby, we compared our model with the published tools which do not distinguish TMPs from water-soluble proteins. The results in the independent verification dataset show that TMP-MIBS has superior performance. This paper explores the interaction mechanism between TMPs and metal ions, which is helpful to understand the structure and function of TMPs and is of great significance to further construct transport mechanisms and identify potential drug targets.
Collapse
|
17
|
TwinCons: Conservation score for uncovering deep sequence similarity and divergence. PLoS Comput Biol 2021; 17:e1009541. [PMID: 34714829 PMCID: PMC8580257 DOI: 10.1371/journal.pcbi.1009541] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 11/10/2021] [Accepted: 10/06/2021] [Indexed: 11/19/2022] Open
Abstract
We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a 'cost' of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life.
Collapse
|
18
|
Blondel L, Besse S, Rivard EL, Ylla G, Extavour CG. Evolution of a cytoplasmic determinant: evidence for the biochemical basis of functional evolution of the novel germ line regulator oskar. Mol Biol Evol 2021; 38:5491-5513. [PMID: 34550378 PMCID: PMC8662646 DOI: 10.1093/molbev/msab284] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Germ line specification is essential in sexually reproducing organisms. Despite their critical role, the evolutionary history of the genes that specify animal germ cells is heterogeneous and dynamic. In many insects, the gene oskar is required for the specification of the germ line. However, the germ line role of oskar is thought to be a derived role resulting from co-option from an ancestral somatic role. To address how evolutionary changes in protein sequence could have led to changes in the function of Oskar protein that enabled it to regulate germ line specification, we searched for oskar orthologs in 1,565 publicly available insect genomic and transcriptomic data sets. The earliest-diverging lineage in which we identified an oskar ortholog was the order Zygentoma (silverfish and firebrats), suggesting that oskar originated before the origin of winged insects. We noted some order-specific trends in oskar sequence evolution, including whole gene duplications, clade-specific losses, and rapid divergence. An alignment of all known 379 Oskar sequences revealed new highly conserved residues as candidates that promote dimerization of the LOTUS domain. Moreover, we identified regions of the OSK domain with conserved predicted RNA binding potential. Furthermore, we show that despite a low overall amino acid conservation, the LOTUS domain shows higher conservation of predicted secondary structure than the OSK domain. Finally, we suggest new key amino acids in the LOTUS domain that may be involved in the previously reported Oskar−Vasa physical interaction that is required for its germ line role.
Collapse
Affiliation(s)
- Leo Blondel
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Savandara Besse
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Emily L Rivard
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Guillem Ylla
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Cassandra G Extavour
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
19
|
Cortal A, Martignetti L, Six E, Rausell A. Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID. Nat Biotechnol 2021; 39:1095-1102. [PMID: 33927417 DOI: 10.1038/s41587-021-00896-6] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 03/15/2021] [Indexed: 02/08/2023]
Abstract
Because of the stochasticity associated with high-throughput single-cell sequencing, current methods for exploring cell-type diversity rely on clustering-based computational approaches in which heterogeneity is characterized at cell subpopulation rather than at full single-cell resolution. Here we present Cell-ID, a clustering-free multivariate statistical method for the robust extraction of per-cell gene signatures from single-cell sequencing data. We applied Cell-ID to data from multiple human and mouse samples, including blood cells, pancreatic islets and airway, intestinal and olfactory epithelium, as well as to comprehensive mouse cell atlas datasets. We demonstrate that Cell-ID signatures are reproducible across different donors, tissues of origin, species and single-cell omics technologies, and can be used for automatic cell-type annotation and cell matching across datasets. Cell-ID improves biological interpretation at individual cell level, enabling discovery of previously uncharacterized rare cell types or cell states. Cell-ID is distributed as an open-source R software package.
Collapse
Affiliation(s)
- Akira Cortal
- Clinical Bioinformatics Laboratory, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France
| | - Loredana Martignetti
- Clinical Bioinformatics Laboratory, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France
| | - Emmanuelle Six
- Laboratory of Human Lymphohematopoiesis, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France
| | - Antonio Rausell
- Clinical Bioinformatics Laboratory, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France. .,Molecular Genetics Service, AP-HP, Necker Hospital for Sick Children, Paris, France.
| |
Collapse
|
20
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
21
|
Pitarch B, Ranea JAG, Pazos F. Protein residues determining interaction specificity in paralogous families. Bioinformatics 2021; 37:1076-1082. [PMID: 33135068 DOI: 10.1093/bioinformatics/btaa934] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 10/06/2020] [Accepted: 10/22/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Predicting the residues controlling a protein's interaction specificity is important not only to better understand its interactions but also to design mutations aimed at fine-tuning or swapping them as well. RESULTS In this work, we present a methodology that combines sequence information (in the form of multiple sequence alignments) with interactome information to detect that kind of residues in paralogous families of proteins. The interactome is used to define pairwise similarities of interaction contexts for the proteins in the alignment. The method looks for alignment positions with patterns of amino-acid changes reflecting the similarities/differences in the interaction neighborhoods of the corresponding proteins. We tested this new methodology in a large set of human paralogous families with structurally characterized interactions, and discuss in detail the results for the RasH family. We show that this approach is a better predictor of interfacial residues than both, sequence conservation and an equivalent 'unsupervised' method that does not use interactome information. AVAILABILITY AND IMPLEMENTATION http://csbg.cnb.csic.es/pazos/Xdet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Borja Pitarch
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga 29071, Spain.,CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), Malaga, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| |
Collapse
|
22
|
McCafferty CL, Taylor DW, Marcotte EM. Improving integrative 3D modeling into low- to medium-resolution electron microscopy structures with evolutionary couplings. Protein Sci 2021; 30:1006-1021. [PMID: 33759266 PMCID: PMC8040867 DOI: 10.1002/pro.4067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 03/16/2021] [Accepted: 03/16/2021] [Indexed: 12/12/2022]
Abstract
Electron microscopy (EM) continues to provide near-atomic resolution structures for well-behaved proteins and protein complexes. Unfortunately, structures of some complexes are limited to low- to medium-resolution due to biochemical or conformational heterogeneity. Thus, the application of unbiased systematic methods for fitting individual structures into EM maps is important. A method that employs co-evolutionary information obtained solely from sequence data could prove invaluable for quick, confident localization of subunits within these structures. Here, we incorporate the co-evolution of intermolecular amino acids as a new type of distance restraint in the integrative modeling platform in order to build three-dimensional models of atomic structures into EM maps ranging from 10-14 Å in resolution. We validate this method using four complexes of known structure, where we highlight the conservation of intermolecular couplings despite dynamic conformational changes using the BAM complex. Finally, we use this method to assemble the subunits of the bacterial holo-translocon into a model that agrees with previous biochemical data. The use of evolutionary couplings in integrative modeling improves systematic, unbiased fitting of atomic models into medium- to low-resolution EM maps, providing additional information to integrative models lacking in spatial data.
Collapse
Affiliation(s)
| | - David W. Taylor
- Department of Molecular BiosciencesUniversity of Texas at AustinAustinTexasUSA
- Center for Systems and Synthetic BiologyUniversity of Texas at AustinAustinTexasUSA
- LIVESTRONG Cancer InstitutesDell Medical SchoolAustinTexasUSA
| | - Edward M. Marcotte
- Department of Molecular BiosciencesUniversity of Texas at AustinAustinTexasUSA
- Center for Systems and Synthetic BiologyUniversity of Texas at AustinAustinTexasUSA
| |
Collapse
|
23
|
Conformation-Specific Inhibitory Anti-MMP-7 Monoclonal Antibody Sensitizes Pancreatic Ductal Adenocarcinoma Cells to Chemotherapeutic Cell Kill. Cancers (Basel) 2021; 13:cancers13071679. [PMID: 33918254 PMCID: PMC8038143 DOI: 10.3390/cancers13071679] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 03/23/2021] [Accepted: 03/30/2021] [Indexed: 02/07/2023] Open
Abstract
Matrix metalloproteases (MMPs) undergo post-translational modifications including pro-domain shedding. The activated forms of these enzymes are effective drug targets, but generating potent biological inhibitors against them remains challenging. We report the generation of anti-MMP-7 inhibitory monoclonal antibody (GSM-192), using an alternating immunization strategy with an active site mimicry antigen and the activated enzyme. Our protocol yielded highly selective anti-MMP-7 monoclonal antibody, which specifically inhibits MMP-7's enzyme activity with high affinity (IC50 = 132 ± 10 nM). The atomic model of the MMP-7-GSM-192 Fab complex exhibited antibody binding to unique epitopes at the rim of the enzyme active site, sterically preventing entry of substrates into the catalytic cleft. In human PDAC biopsies, tissue staining with GSM-192 showed characteristic spatial distribution of activated MMP-7. Treatment with GSM-192 in vitro induced apoptosis via stabilization of cell surface Fas ligand and retarded cell migration. Co-treatment with GSM-192 and chemotherapeutics, gemcitabine and oxaliplatin elicited a synergistic effect. Our data illustrate the advantage of precisely targeting catalytic MMP-7 mediated disease specific activity.
Collapse
|
24
|
Bojkova D, McGreig JE, McLaughlin KM, Masterson SG, Antczak M, Widera M, Krähling V, Ciesek S, Wass MN, Michaelis M, Cinatl J. Differentially conserved amino acid positions may reflect differences in SARS-CoV-2 and SARS-CoV behaviour. Bioinformatics 2021; 37:2282-2288. [PMID: 33560365 PMCID: PMC7929367 DOI: 10.1093/bioinformatics/btab094] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 12/23/2020] [Accepted: 02/05/2021] [Indexed: 12/18/2022] Open
Abstract
Motivation SARS-CoV-2 is a novel coronavirus currently causing a pandemic. Here, we performed a combined in-silico and cell culture comparison of SARS-CoV-2 and the closely related SARS-CoV. Results Many amino acid positions are differentially conserved between SARS-CoV-2 and SARS-CoV, which reflects the discrepancies in virus behaviour, i.e. more effective human-to-human transmission of SARS-CoV-2 and higher mortality associated with SARS-CoV. Variations in the S protein (mediates virus entry) were associated with differences in its interaction with ACE2 (cellular S receptor) and sensitivity to TMPRSS2 (enables virus entry via S cleavage) inhibition. Anti-ACE2 antibodies more strongly inhibited SARS-CoV than SARS-CoV-2 infection, probably due to a stronger SARS-CoV-2 S-ACE2 affinity relative to SARS-CoV S. Moreover, SARS-CoV-2 and SARS-CoV displayed differences in cell tropism. Cellular ACE2 and TMPRSS2 levels did not indicate susceptibility to SARS-CoV-2. In conclusion, we identified genomic variation between SARS-CoV-2 and SARS-CoV that may reflect the differences in their clinical and biological behaviour. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Denisa Bojkova
- Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany
| | - Jake E McGreig
- School of Biosciences, University of Kent, Canterbury, UK
| | | | | | | | - Marek Widera
- Institute of Virology, Biomedical Research Center (BMFZ), Philipps University Marburg, Germany
| | - Verena Krähling
- Institute of Virology, Biomedical Research Center (BMFZ), Philipps University Marburg, Germany
| | - Sandra Ciesek
- Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany.,German Center for Infection Research, DZIF, Braunschweig, Germany
| | - Mark N Wass
- School of Biosciences, University of Kent, Canterbury, UK
| | | | - Jindrich Cinatl
- Institute for Medical Virology, University Hospital, Goethe University Frankfurt am Main, Germany
| |
Collapse
|
25
|
Buhrman G, Enríquez P, Dillard L, Baer H, Truong V, Grunden AM, Rose RB. Structure, Function, and Thermal Adaptation of the Biotin Carboxylase Domain Dimer from Hydrogenobacter thermophilus 2-Oxoglutarate Carboxylase. Biochemistry 2021; 60:324-345. [PMID: 33464881 DOI: 10.1021/acs.biochem.0c00815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
2-Oxoglutarate carboxylase (OGC), a unique member of the biotin-dependent carboxylase family from the order Aquificales, captures dissolved CO2 via the reductive tricarboxylic acid (rTCA) cycle. Structure and function studies of OGC may facilitate adaptation of the rTCA cycle to increase the level of carbon fixation for biofuel production. Here we compare the biotin carboxylase (BC) domain of Hydrogenobacter thermophilus OGC with the well-studied mesophilic homologues to identify features that may contribute to thermal stability and activity. We report three OGC BC X-ray structures, each bound to bicarbonate, ADP, or ADP-Mg2+, and propose that substrate binding at high temperatures is facilitated by interactions that stabilize the flexible subdomain B in a partially closed conformation. Kinetic measurements with varying ATP and biotin concentrations distinguish two temperature-dependent steps, consistent with biotin's rate-limiting role in organizing the active site. Transition state thermodynamic values derived from the Eyring equation indicate a larger positive ΔH⧧ and a less negative ΔS⧧ compared to those of a previously reported mesophilic homologue. These thermodynamic values are explained by partially rate limiting product release. Phylogenetic analysis of BC domains suggests that OGC diverged prior to Aquificales evolution. The phylogenetic tree identifies mis-annotations of the Aquificales BC sequences, including the Aquifex aeolicus pyruvate carboxylase structure. Notably, our structural data reveal that the OGC BC dimer comprises a "wet" dimerization interface that is dominated by hydrophilic interactions and structural water molecules common to all BC domains and likely facilitates the conformational changes associated with the catalytic cycle. Mutations in the dimerization domain demonstrate that dimerization contributes to thermal stability.
Collapse
Affiliation(s)
- Greg Buhrman
- Department of Molecular & Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-7622, United States
| | - Paul Enríquez
- Department of Molecular & Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-7622, United States
| | - Lucas Dillard
- Department of Molecular & Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-7622, United States
| | - Hayden Baer
- Department of Molecular & Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-7622, United States
| | - Vivian Truong
- Department of Molecular & Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-7622, United States
| | - Amy M Grunden
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, North Carolina 27695-7612, United States
| | - Robert B Rose
- Department of Molecular & Structural Biochemistry, North Carolina State University, Raleigh, North Carolina 27695-7622, United States
| |
Collapse
|
26
|
Pontes C, Ruiz-Serra V, Lepore R, Valencia A. Unraveling the molecular basis of host cell receptor usage in SARS-CoV-2 and other human pathogenic β-CoVs. Comput Struct Biotechnol J 2021; 19:759-766. [PMID: 33456724 PMCID: PMC7802526 DOI: 10.1016/j.csbj.2021.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/07/2021] [Accepted: 01/07/2021] [Indexed: 01/13/2023] Open
Abstract
The recent emergence of the novel SARS-CoV-2 in China and its rapid spread in the human population has led to a public health crisis worldwide. Like in SARS-CoV, horseshoe bats currently represent the most likely candidate animal source for SARS-CoV-2. Yet, the specific mechanisms of cross-species transmission and adaptation to the human host remain unknown. Here we show that the unsupervised analysis of conservation patterns across the β-CoV spike protein family, using sequence information alone, can provide valuable insights on the molecular basis of the specificity of β-CoVs to different host cell receptors. More precisely, our results indicate that host cell receptor usage is encoded in the amino acid sequences of different CoV spike proteins in the form of a set of specificity determining positions (SDPs). Furthermore, by integrating structural data, in silico mutagenesis and coevolution analysis we could elucidate the role of SDPs in mediating ACE2 binding across the Sarbecovirus lineage, either by engaging the receptor through direct intermolecular interactions or by affecting the local environment of the receptor binding motif. Finally, by the analysis of coevolving mutations across a paired MSA we were able to identify key intermolecular contacts occurring at the spike-ACE2 interface. These results show that effective mining of the evolutionary records held in the sequence of the spike protein family can help tracing the molecular mechanisms behind the evolution and host-receptor adaptation of circulating and future novel β-CoVs.
Collapse
Key Words
- APC, average product correction
- CoVs, Coronaviruses
- EV, evolutionary rate
- Functional specificity
- MCA, multiple correspondence analysis
- MI, mutual information
- MSA, multiple sequence alignment
- NTD, N-terminal domain
- Phylogenetic analysis
- Protein subfamilies
- RBD, receptor binding domain
- RBM, receptor binding motif
- SARS-CoV-2
- SDPs, specificity determining positions
- Specificity Determining Positions
- Spike protein evolution
- hACE2, human angiotensin converting enzyme 2
Collapse
Affiliation(s)
- Camila Pontes
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- University of Brasília (UnB), 70910-900, Brasília - DF, Brazil
| | | | - Rosalba Lepore
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
27
|
Casas-Pastor D, Diehl A, Fritz G. Coevolutionary Analysis Reveals a Conserved Dual Binding Interface between Extracytoplasmic Function σ Factors and Class I Anti-σ Factors. mSystems 2020; 5:e00310-20. [PMID: 32753504 PMCID: PMC7406223 DOI: 10.1128/msystems.00310-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 07/17/2020] [Indexed: 11/30/2022] Open
Abstract
Extracytoplasmic function σ factors (ECFs) belong to the most abundant signal transduction mechanisms in bacteria. Among the diverse regulators of ECF activity, class I anti-σ factors are the most important signal transducers in response to internal and external stress conditions. Despite the conserved secondary structure of the class I anti-σ factor domain (ASDI) that binds and inhibits the ECF under noninducing conditions, the binding interface between ECFs and ASDIs is surprisingly variable between the published cocrystal structures. In this work, we provide a comprehensive computational analysis of the ASDI protein family and study the different contact themes between ECFs and ASDIs. To this end, we harness the coevolution of these diverse protein families and predict covarying amino acid residues as likely candidates of an interaction interface. As a result, we find two common binding interfaces linking the first alpha-helix of the ASDI to the DNA-binding region in the σ4 domain of the ECF, and the fourth alpha-helix of the ASDI to the RNA polymerase (RNAP)-binding region of the σ2 domain. The conservation of these two binding interfaces contrasts with the apparent quaternary structure diversity of the ECF/ASDI complexes, partially explaining the high specificity between cognate ECF and ASDI pairs. Furthermore, we suggest that the dual inhibition of RNAP- and DNA-binding interfaces is likely a universal feature of other ECF anti-σ factors, preventing the formation of nonfunctional trimeric complexes between σ/anti-σ factors and RNAP or DNA.IMPORTANCE In the bacterial world, extracytoplasmic function σ factors (ECFs) are the most widespread family of alternative σ factors, mediating many cellular responses to environmental cues, such as stress. This work uses a computational approach to investigate how these σ factors interact with class I anti-σ factors-the most abundant regulators of ECF activity. By comprehensively classifying the anti-σs into phylogenetic groups and by comparing this phylogeny to the one of the cognate ECFs, the study shows how these protein families have coevolved to maintain their interaction over evolutionary time. These results shed light on the common contact residues that link ECFs and anti-σs in different phylogenetic families and set the basis for the rational design of anti-σs to specifically target certain ECFs. This will help to prevent the cross talk between heterologous ECF/anti-σ pairs, allowing their use as orthogonal regulators for the construction of genetic circuits in synthetic biology.
Collapse
Affiliation(s)
- Delia Casas-Pastor
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany
| | - Angelika Diehl
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany
- School of Molecular Sciences, University of Western Australia, Perth, Australia
| | - Georg Fritz
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany
- School of Molecular Sciences, University of Western Australia, Perth, Australia
| |
Collapse
|
28
|
Domain-mediated interactions for protein subfamily identification. Sci Rep 2020; 10:264. [PMID: 31937869 PMCID: PMC6959277 DOI: 10.1038/s41598-019-57187-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 12/23/2019] [Indexed: 11/24/2022] Open
Abstract
Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.
Collapse
|
29
|
Tubiana J, Cocco S, Monasson R. Learning Compositional Representations of Interacting Systems with Restricted Boltzmann Machines: Comparative Study of Lattice Proteins. Neural Comput 2019; 31:1671-1717. [DOI: 10.1162/neco_a_01210] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A restricted Boltzmann machine (RBM) is an unsupervised machine learning bipartite graphical model that jointly learns a probability distribution over data and extracts their relevant statistical features. RBMs were recently proposed for characterizing the patterns of coevolution between amino acids in protein sequences and for designing new sequences. Here, we study how the nature of the features learned by RBM changes with its defining parameters, such as the dimensionality of the representations (size of the hidden layer) and the sparsity of the features. We show that for adequate values of these parameters, RBMs operate in a so-called compositional phase in which visible configurations sampled from the RBM are obtained by recombining these features. We then compare the performance of RBM with other standard representation learning algorithms, including principal or independent component analysis (PCA, ICA), autoencoders (AE), variational autoencoders (VAE), and their sparse variants. We show that RBMs, due to the stochastic mapping between data configurations and representations, better capture the underlying interactions in the system and are significantly more robust with respect to sample size than deterministic methods such as PCA or ICA. In addition, this stochastic mapping is not prescribed a priori as in VAE, but learned from data, which allows RBMs to show good performance even with shallow architectures. All numerical results are illustrated on synthetic lattice protein data that share similar statistical features with real protein sequences and for which ground-truth interactions are known.
Collapse
Affiliation(s)
- Jérôme Tubiana
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS and PSL Research, 75005 Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS and PSL Research, 75005 Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS and PSL Research, 75005 Paris, France
| |
Collapse
|
30
|
Suplatov DA, Kopylov KE, Popova NN, Voevodin VV, Švedas VK. Mustguseal: a server for multiple structure-guided sequence alignment of protein families. Bioinformatics 2019; 34:1583-1585. [PMID: 29309510 DOI: 10.1093/bioinformatics/btx831] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 12/21/2017] [Indexed: 01/23/2023] Open
Abstract
Motivation Comparative analysis of homologous proteins in a functionally diverse superfamily is a valuable tool at studying structure-function relationship, but represents a methodological challenge. Results The Mustguseal web-server can automatically build large structure-guided sequence alignments of functionally diverse protein families that include thousands of proteins basing on all available information about their structures and sequences in public databases. Superimposition of protein structures is implemented to compare evolutionarily distant relatives, whereas alignment of sequences is used to compare close homologues. The final alignment can be downloaded for a local use or operated on-line with the built-in interactive tools and further submitted to the integrated sister web-servers of Mustguseal to analyze conserved, subfamily-specific and co-evolving residues at studying a protein function and regulation, designing improved enzyme variants for practical applications and selective ligands to modulate functional properties of proteins. Availability and implementation Freely available on the web at https://biokinet.belozersky.msu.ru/mustguseal. Contact vytas@belozersky.msu.ru. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Nina N Popova
- Faculty of Computational Mathematics and Cybernetics
| | - Vladimir V Voevodin
- Faculty of Computational Mathematics and Cybernetics.,Research Computing Center of the Lomonosov Moscow State University, Moscow 119991, Russia
| | - Vytas K Švedas
- Belozersky Institute of Physicochemical Biology.,Faculty of Bioengineering and Bioinformatics
| |
Collapse
|
31
|
Martell HJ, Masterson SG, McGreig JE, Michaelis M, Wass MN. Is the Bombali virus pathogenic in humans? Bioinformatics 2019; 35:3553-3558. [DOI: 10.1093/bioinformatics/btz267] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 03/14/2019] [Accepted: 04/15/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
The potential of the Bombali virus, a novel Ebolavirus, to cause disease in humans remains unknown. We have previously identified potential determinants of Ebolavirus pathogenicity in humans by analysing the amino acid positions that are differentially conserved (specificity determining positions; SDPs) between human pathogenic Ebolaviruses and the non-pathogenic Reston virus. Here, we include the many Ebolavirus genome sequences that have since become available into our analysis and investigate the amino acid sequence of the Bombali virus proteins at the SDPs that discriminate between human pathogenic and non-human pathogenic Ebolaviruses.
Results
The use of 1408 Ebolavirus genomes (196 in the original analysis) resulted in a set of 166 SDPs (reduced from 180), 146 (88%) of which were retained from the original analysis. This indicates the robustness of our approach and refines the set of SDPs that distinguish human pathogenic Ebolaviruses from Reston virus. At SDPs, Bombali virus shared the majority of amino acids with the human pathogenic Ebolaviruses (63.25%). However, for two SDPs in VP24 (M136L, R139S) that have been proposed to be critical for the lack of Reston virus human pathogenicity because they alter the VP24-karyopherin interaction, the Bombali virus amino acids match those of Reston virus. Thus, Bombali virus may not be pathogenic in humans. Supporting this, no Bombali virus-associated disease outbreaks have been reported, although Bombali virus was isolated from fruit bats cohabitating in close contact with humans, and anti-Ebolavirus antibodies that may indicate contact with Bombali virus have been detected in humans.
Availability and implementation
Data files are available from https://github.com/wasslab/EbolavirusSDPsBioinformatics2019.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Henry J Martell
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, Kent, UK
| | - Stuart G Masterson
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, Kent, UK
| | - Jake E McGreig
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, Kent, UK
| | - Martin Michaelis
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, Kent, UK
| | - Mark N Wass
- Industrial Biotechnology Centre and School of Biosciences, University of Kent, Canterbury, Kent, UK
| |
Collapse
|
32
|
Tubiana J, Cocco S, Monasson R. Learning protein constitutive motifs from sequence data. eLife 2019; 8:e39397. [PMID: 30857591 PMCID: PMC6436896 DOI: 10.7554/elife.39397] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 02/24/2019] [Indexed: 12/11/2022] Open
Abstract
Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype-phenotype relationship for protein families.
Collapse
Affiliation(s)
- Jérôme Tubiana
- Laboratory of Physics of the Ecole Normale SupérieureCNRS UMR 8023 & PSL ResearchParisFrance
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale SupérieureCNRS UMR 8023 & PSL ResearchParisFrance
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale SupérieureCNRS UMR 8023 & PSL ResearchParisFrance
| |
Collapse
|
33
|
Cui Y, Dong Q, Hong D, Wang X. Predicting protein-ligand binding residues with deep convolutional neural networks. BMC Bioinformatics 2019; 20:93. [PMID: 30808287 PMCID: PMC6390579 DOI: 10.1186/s12859-019-2672-1] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 02/07/2019] [Indexed: 02/01/2023] Open
Abstract
Background Ligand-binding proteins play key roles in many biological processes. Identification of protein-ligand binding residues is important in understanding the biological functions of proteins. Existing computational methods can be roughly categorized as sequence-based or 3D-structure-based methods. All these methods are based on traditional machine learning. In a series of binding residue prediction tasks, 3D-structure-based methods are widely superior to sequence-based methods. However, due to the great number of proteins with known amino acid sequences, sequence-based methods have considerable room for improvement with the development of deep learning. Therefore, prediction of protein-ligand binding residues with deep learning requires study. Results In this study, we propose a new sequence-based approach called DeepCSeqSite for ab initio protein-ligand binding residue prediction. DeepCSeqSite includes a standard edition and an enhanced edition. The classifier of DeepCSeqSite is based on a deep convolutional neural network. Several convolutional layers are stacked on top of each other to extract hierarchical features. The size of the effective context scope is expanded as the number of convolutional layers increases. The long-distance dependencies between residues can be captured by the large effective context scope, and stacking several layers enables the maximum length of dependencies to be precisely controlled. The extracted features are ultimately combined through one-by-one convolution kernels and softmax to predict whether the residues are binding residues. The state-of-the-art ligand-binding method COACH and some of its submethods are selected as baselines. The methods are tested on a set of 151 nonredundant proteins and three extended test sets. Experiments show that the improvement of the Matthews correlation coefficient (MCC) is no less than 0.05. In addition, a training data augmentation method that slightly improves the performance is discussed in this study. Conclusions Without using any templates that include 3D-structure data, DeepCSeqSite significantlyoutperforms existing sequence-based and 3D-structure-based methods, including COACH. Augmentation of the training sets slightly improves the performance. The model, code and datasets are available at https://github.com/yfCuiFaith/DeepCSeqSite.
Collapse
Affiliation(s)
- Yifeng Cui
- Faculty of Education, East China Normal University, 3663 N. Zhongshan Rd., Shanghai, 200062, China.,School of Data Science & Engineering, East China Normal University, Shanghai, 3663 N. Zhongshan Rd., Shanghai, 200062, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, 3663 N. Zhongshan Rd., Shanghai, 200062, China. .,School of Data Science & Engineering, East China Normal University, Shanghai, 3663 N. Zhongshan Rd., Shanghai, 200062, China.
| | - Daocheng Hong
- School of Data Science & Engineering, East China Normal University, Shanghai, 3663 N. Zhongshan Rd., Shanghai, 200062, China
| | - Xikun Wang
- The High School Affiliated of Liaoning Normal University, Dalian, China
| |
Collapse
|
34
|
Agarwal D, Gireesh-Babu P, Pavan-Kumar A, Koringa P, Joshi CG, Gora A, Bhat IA, Chaudhari A. Molecular characterization and expression profiling of 17-beta-hydroxysteroid dehydrogenase 2 and spermatogenesis associated protein 2 genes in endangered catfish, Clarias magur (Hamilton, 1822). Anim Biotechnol 2018; 31:93-106. [PMID: 30570357 DOI: 10.1080/10495398.2018.1545663] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The 17-beta-hydroxysteroid dehydrogenase 2 (17β-HSD2) enzyme regulates steroid levels by the inactivation of estrogen and androgens. Spermatogenesis associated protein 2 (SPATA2) plays a vital role in spermatogenesis in vertebrates including fish. We report cloning and characterization of full cds of 17β-HSD2 and SPATA2 genes in Clarias magur. The full-length cDNA sequences of 17β-HSD2 and SPATA2 were 1187 bp (ORF 1125 bp) and 1806 bp (ORF 1524 bp) encoding 375 and 508 amino acids, respectively. Signal peptide analysis revealed SPATA2 is nonsecretory, while 17β-HSD2 is a secretory protein. Hydropathy profiles showed both proteins are hydrophilic in nature. Tissue distribution of both the genes revealed high mRNA level of SPATA2 in all tissues examined indicating its wide range of expression. 17β-HSD2 indicated higher expression in preparatory phase compared to spawning phase in ovary while it was opposite in case of testis. SPATA2 showed significantly higher expression in preparatory phase compared to spawning phase in both ovary and testis. Administration of OvatideTM (GnRH analog) resulted in upregulation of SPATA2 expression at 6 and 16 h post-injection while 17β-HSD2 showed upregulation only at 6 h post-injection. To the best of our knowledge, this is a first report on characterization of 17β-HSD2 and SPATA2 full-length cDNA in catfish.
Collapse
Affiliation(s)
- Deepak Agarwal
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Fisheries Education (CIFE), Mumbai, Maharashtra, India
| | - Pathakota Gireesh-Babu
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Fisheries Education (CIFE), Mumbai, Maharashtra, India
| | - Annam Pavan-Kumar
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Fisheries Education (CIFE), Mumbai, Maharashtra, India
| | - Prakash Koringa
- Animal Biotechnology Department, College of veterinary Sciences and Animal Husbandry, Anand Agricultural University, Anand, Gujarat, India
| | - Chaitanya G Joshi
- Animal Biotechnology Department, College of veterinary Sciences and Animal Husbandry, Anand Agricultural University, Anand, Gujarat, India
| | - Adnan Gora
- Central Marine Fisheries Research Institute, Kochi, Kerala, India
| | - Irfan Ahmad Bhat
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Fisheries Education (CIFE), Mumbai, Maharashtra, India
| | - Aparna Chaudhari
- Fish Genetics and Biotechnology Division, ICAR-Central Institute of Fisheries Education (CIFE), Mumbai, Maharashtra, India
| |
Collapse
|
35
|
Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Comput Biol 2018; 14:e1006409. [PMID: 30192744 PMCID: PMC6145588 DOI: 10.1371/journal.pcbi.1006409] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/19/2018] [Accepted: 07/31/2018] [Indexed: 01/09/2023] Open
Abstract
Mutational correlation patterns found in population-level sequence data for the Human Immunodeficiency Virus (HIV) and the Hepatitis C Virus (HCV) have been demonstrated to be informative of viral fitness. Such patterns can be seen as footprints of the intrinsic functional constraints placed on viral evolution under diverse selective pressures. Here, considering multiple HIV and HCV proteins, we demonstrate that these mutational correlations encode a modular co-evolutionary structure that is tightly linked to the structural and functional properties of the respective proteins. Specifically, by introducing a robust statistical method based on sparse principal component analysis, we identify near-disjoint sets of collectively-correlated residues (sectors) having mostly a one-to-one association to largely distinct structural or functional domains. This suggests that the distinct phenotypic properties of HIV/HCV proteins often give rise to quasi-independent modes of evolution, with each mode involving a sparse and localized network of mutational interactions. Moreover, individual inferred sectors of HIV are shown to carry immunological significance, providing insight for guiding targeted vaccine strategies.
Collapse
|
36
|
Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018; 19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Present address: Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, c/ Dr. Aiguader, 88, 08003, Barcelona, Spain.,Present address: Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10-12, 08002, Barcelona, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049, Madrid, Spain.
| |
Collapse
|
37
|
Brown T, Brown N, Stollar EJ. Most yeast SH3 domains bind peptide targets with high intrinsic specificity. PLoS One 2018; 13:e0193128. [PMID: 29470497 PMCID: PMC5823434 DOI: 10.1371/journal.pone.0193128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 02/04/2018] [Indexed: 01/07/2023] Open
Abstract
A need exists to develop bioinformatics for predicting differences in protein function, especially for members of a domain family who share a common fold, yet are found in a diverse array of proteins. Many domain families have been conserved over large evolutionary spans and representative genomic data during these periods are now available. This allows a simple method for grouping domain sequences to reveal common and unique/specific binding residues. As such, we hypothesize that sequence alignment analysis of the yeast SH3 domain family across ancestral species in the fungal kingdom can determine whether each member encodes specific information to bind unique peptide targets. With this approach, we identify important specific residues for a given domain as those that show little conservation within an alignment of yeast domain family members (paralogs) but are conserved in an alignment of its direct relatives (orthologs). We find most of the yeast SH3 domain family members have maintained unique amino acid conservation patterns that suggest they bind peptide targets with high intrinsic specificity through varying degrees of non-canonical recognition. For a minority of domains, we predict a less diverse binding surface, likely requiring additional factors to bind targets specifically. We observe that our predictions are consistent with high throughput binding data, which suggests our approach can probe intrinsic binding specificity in any other interaction domain family that is maintained during evolution.
Collapse
Affiliation(s)
- Tom Brown
- Math and Computer Science Department, Eastern New Mexico University, Portales, NM, United States of America
| | - Nick Brown
- Portales High School, Portales, NM, United States of America
| | - Elliott J. Stollar
- Physical Sciences Department, Eastern New Mexico University, Portales, NM, United States of America
- * E-mail:
| |
Collapse
|
38
|
Suplatov D, Sharapova Y, Timonina D, Kopylov K, Švedas V. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families. J Bioinform Comput Biol 2017; 16:1840005. [PMID: 29361894 DOI: 10.1142/s021972001840005x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
Collapse
Affiliation(s)
- Dmitry Suplatov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Yana Sharapova
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Daria Timonina
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Kirill Kopylov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Vytas Švedas
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| |
Collapse
|
39
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
40
|
Banerjee A, Pal A, Pal D, Mitra P. Ebolavirus interferon antagonists—protein interaction perspectives to combat pathogenesis. Brief Funct Genomics 2017; 17:392-401. [DOI: 10.1093/bfgp/elx034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
41
|
Carrillo-de-Santa-Pau E, Juan D, Pancaldi V, Were F, Martin-Subero I, Rico D, Valencia A. Automatic identification of informative regions with epigenomic changes associated to hematopoiesis. Nucleic Acids Res 2017; 45:9244-9259. [PMID: 28934481 PMCID: PMC5716146 DOI: 10.1093/nar/gkx618] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 07/06/2017] [Indexed: 12/19/2022] Open
Abstract
Hematopoiesis is one of the best characterized biological systems but the connection between chromatin changes and lineage differentiation is not yet well understood. We have developed a bioinformatic workflow to generate a chromatin space that allows to classify 42 human healthy blood epigenomes from the BLUEPRINT, NIH ROADMAP and ENCODE consortia by their cell type. This approach let us to distinguish different cells types based on their epigenomic profiles, thus recapitulating important aspects of human hematopoiesis. The analysis of the orthogonal dimension of the chromatin space identify 32,662 chromatin determinant regions (CDRs), genomic regions with different epigenetic characteristics between the cell types. Functional analysis revealed that these regions are linked with cell identities. The inclusion of leukemia epigenomes in the healthy hematological chromatin sample space gives us insights on the healthy cell types that are more epigenetically similar to the disease samples. Further analysis of tumoral epigenetic alterations in hematopoietic CDRs points to sets of genes that are tightly regulated in leukemic transformations and commonly mutated in other tumors. Our method provides an analytical approach to study the relationship between epigenomic changes and cell lineage differentiation. Method availability: https://github.com/david-juan/ChromDet.
Collapse
Affiliation(s)
| | - David Juan
- Institut de Biologia Evolutiva, Consejo Superior de Investigaciones Científicas-Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, 08003, Spain
| | - Vera Pancaldi
- Barcelona Supercomputing Centre (BSC), Barcelona, 08034, Spain
| | - Felipe Were
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Ignacio Martin-Subero
- Institut d'Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), Department of Anatomic Pathology, Pharmacology and Microbiology, University of Barcelona, Barcelona, 08036, Spain
| | - Daniel Rico
- Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Alfonso Valencia
- Barcelona Supercomputing Centre (BSC), Barcelona, 08034, Spain.,ICREA, Pg. Lluís Companys 23, Barcelona, 08010, Spain
| | | |
Collapse
|
42
|
Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F. Intelligent mining of large-scale bio-data: Bioinformatics applications. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1364977] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- Farahnaz Sadat Golestan Hashemi
- Plant Genetics, AgroBioChem Department, Gembloux Agro-Bio Tech, University of Liege, Liege, Belgium
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Razi Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Rafii Yusop
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mahboobe Sadat Golestan Hashemi
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Mohammad Hossein Nadimi Shahraki
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Hamid Rastegari
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
| | - Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farzad Aslani
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|
43
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
44
|
Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017; 17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.
Collapse
Affiliation(s)
- Henrike Indrischek
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Eugenia V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria
- Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
45
|
Pai PP, Dattatreya RK, Mondal S. Ensemble Architecture for Prediction of Enzyme‐ligand Binding Residues Using Evolutionary Information. Mol Inform 2017. [DOI: 10.1002/minf.201700021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Priyadarshini P. Pai
- Department of Biological SciencesBirla Institute of Technology and Science-Pilani, K.K. Birla Goa Campus. Near NH17 Bypass Road Zuarinagar, Goa India
| | - Rohit Kadam Dattatreya
- Department of EconomicsBirla Institute of Technology and Science-Pilani, K.K. Birla Goa Campus. Near NH17 Bypass Road Zuarinagar, Goa India, PIN: 403726
| | - Sukanta Mondal
- Department of Biological SciencesBirla Institute of Technology and Science-Pilani, K.K. Birla Goa Campus. Near NH17 Bypass Road Zuarinagar, Goa India
| |
Collapse
|
46
|
Abstract
Out of the five members of the Ebolavirus family, four cause life-threatening disease, whereas the fifth, Reston virus (RESTV), is nonpathogenic in humans. Out of the five members of the Ebolavirus family, four cause life-threatening disease, whereas the fifth, Reston virus (RESTV), is nonpathogenic in humans. The reasons for this discrepancy remain unclear. In this review, we analyze the currently available information to provide a state-of-the-art summary of the factors that determine the human pathogenicity of Ebolaviruses. RESTV causes sporadic infections in cynomolgus monkeys and is found in domestic pigs throughout the Philippines and China. Phylogenetic analyses revealed that RESTV is most closely related to the Sudan virus, which causes a high mortality rate in humans. Amino acid sequence differences between RESTV and the other Ebolaviruses are found in all nine Ebolavirus proteins, though no one residue appears sufficient to confer pathogenicity. Changes in the glycoprotein contribute to differences in Ebolavirus pathogenicity but are not sufficient to confer pathogenicity on their own. Similarly, differences in VP24 and VP35 affect viral immune evasion and are associated with changes in human pathogenicity. A recent in silico analysis systematically determined the functional consequences of sequence variations between RESTV and human-pathogenic Ebolaviruses. Multiple positions in VP24 were differently conserved between RESTV and the other Ebolaviruses and may alter human pathogenicity. In conclusion, the factors that determine the pathogenicity of Ebolaviruses in humans remain insufficiently understood. An improved understanding of these pathogenicity-determining factors is of crucial importance for disease prevention and for the early detection of emergent and potentially human-pathogenic RESTVs.
Collapse
|
47
|
Moll M, Finn PW, Kavraki LE. Structure-guided selection of specificity determining positions in the human Kinome. BMC Genomics 2016; 17 Suppl 4:431. [PMID: 27556159 PMCID: PMC5001202 DOI: 10.1186/s12864-016-2790-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background The human kinome contains many important drug targets. It is well-known that inhibitors of protein kinases bind with very different selectivity profiles. This is also the case for inhibitors of many other protein families. The increased availability of protein 3D structures has provided much information on the structural variation within a given protein family. However, the relationship between structural variations and binding specificity is complex and incompletely understood. We have developed a structural bioinformatics approach which provides an analysis of key determinants of binding selectivity as a tool to enhance the rational design of drugs with a specific selectivity profile. Results We propose a greedy algorithm that computes a subset of residue positions in a multiple sequence alignment such that structural and chemical variation in those positions helps explain known binding affinities. By providing this information, the main purpose of the algorithm is to provide experimentalists with possible insights into how the selectivity profile of certain inhibitors is achieved, which is useful for lead optimization. In addition, the algorithm can also be used to predict binding affinities for structures whose affinity for a given inhibitor is unknown. The algorithm’s performance is demonstrated using an extensive dataset for the human kinome. Conclusion We show that the binding affinity of 38 different kinase inhibitors can be explained with consistently high precision and accuracy using the variation of at most six residue positions in the kinome binding site. We show for several inhibitors that we are able to identify residues that are known to be functionally important.
Collapse
Affiliation(s)
- Mark Moll
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA.
| | - Paul W Finn
- University of Buckingham, Hunter St, Buckingham, UK
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA
| |
Collapse
|
48
|
Computational analysis of Ebolavirus data: prospects, promises and challenges. Biochem Soc Trans 2016; 44:973-8. [DOI: 10.1042/bst20160074] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Indexed: 12/22/2022]
Abstract
The ongoing Ebola virus (also known as Zaire ebolavirus, a member of the Ebolavirus family) outbreak in West Africa has so far resulted in >28000 confirmed cases compared with previous Ebolavirus outbreaks that affected a maximum of a few hundred individuals. Hence, Ebolaviruses impose a much greater threat than we may have expected (or hoped). An improved understanding of the virus biology is essential to develop therapeutic and preventive measures and to be better prepared for future outbreaks by members of the Ebolavirus family. Computational investigations can complement wet laboratory research for biosafety level 4 pathogens such as Ebolaviruses for which the wet experimental capacities are limited due to a small number of appropriate containment laboratories. During the current West Africa outbreak, sequence data from many Ebola virus genomes became available providing a rich resource for computational analysis. Here, we consider the studies that have already reported on the computational analysis of these data. A range of properties have been investigated including Ebolavirus evolution and pathogenicity, prediction of micro RNAs and identification of Ebolavirus specific signatures. However, the accuracy of the results remains to be confirmed by wet laboratory experiments. Therefore, communication and exchange between computational and wet laboratory researchers is necessary to make maximum use of computational analyses and to iteratively improve these approaches.
Collapse
|
49
|
Gao J, Zhang Q, Liu M, Zhu L, Wu D, Cao Z, Zhu R. bSiteFinder, an improved protein-binding sites prediction server based on structural alignment: more accurate and less time-consuming. J Cheminform 2016; 8:38. [PMID: 27403208 PMCID: PMC4939519 DOI: 10.1186/s13321-016-0149-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 06/30/2016] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Protein-binding sites prediction lays a foundation for functional annotation of protein and structure-based drug design. As the number of available protein structures increases, structural alignment based algorithm becomes the dominant approach for protein-binding sites prediction. However, the present algorithms underutilize the ever increasing numbers of three-dimensional protein-ligand complex structures (bound protein), and it could be improved on the process of alignment, selection of templates and clustering of template. Herein, we built so far the largest database of bound templates with stringent quality control. And on this basis, bSiteFinder as a protein-binding sites prediction server was developed. RESULTS By introducing Homology Indexing, Chain Length Indexing, Stability of Complex and Optimized Multiple-Templates Clustering into our algorithm, the efficiency of our server has been significantly improved. Further, the accuracy was approximately 2-10 % higher than that of other algorithms for the test with either bound dataset or unbound dataset. For 210 bound dataset, bSiteFinder achieved high accuracies up to 94.8 % (MCC 0.95). For another 48 bound/unbound dataset, bSiteFinder achieved high accuracies up to 93.8 % for bound proteins (MCC 0.95) and 85.4 % for unbound proteins (MCC 0.72). Our bSiteFinder server is freely available at http://binfo.shmtu.edu.cn/bsitefinder/, and the source code is provided at the methods page. CONCLUSION An online bSiteFinder server is freely available at http://binfo.shmtu.edu.cn/bsitefinder/. Our work lays a foundation for functional annotation of protein and structure-based drug design. With ever increasing numbers of three-dimensional protein-ligand complex structures, our server should be more accurate and less time-consuming.Graphical Abstract bSiteFinder (http://binfo.shmtu.edu.cn/bsitefinder/) as a protein-binding sites prediction server was developed based on the largest database of bound templates so far with stringent quality control. By introducing Homology Indexing, Chain Length Indexing, Stability of Complex and Optimized Multiple-Templates Clustering into our algorithm, the efficiency of our server have been significantly improved. What's more, the accuracy was approximately 2-10 % higher than that of other algorithms for the test with either bound dataset or unbound dataset.
Collapse
Affiliation(s)
- Jun Gao
- Department of Bioinformatics, Tongji University, Shanghai, 200092 People's Republic of China ; School of Information Engineering, Shanghai Maritime University, Shanghai, 201306 People's Republic of China
| | - Qingchen Zhang
- Department of Bioinformatics, Tongji University, Shanghai, 200092 People's Republic of China
| | - Min Liu
- School of Information Engineering, Shanghai Maritime University, Shanghai, 201306 People's Republic of China
| | - Lixin Zhu
- Digestive Diseases and Nutrition Center, Department of Pediatrics, The State University of New York at Buffalo, Buffalo, NY 14260 USA ; Genomics, Environment, and Microbiome Community of Excellence, The State University of New York at Buffalo, Buffalo, NY 14203 USA ; Institute of Digestive Diseases, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032 People's Republic of China
| | - Dingfeng Wu
- Department of Bioinformatics, Tongji University, Shanghai, 200092 People's Republic of China
| | - Zhiwei Cao
- Department of Bioinformatics, Tongji University, Shanghai, 200092 People's Republic of China
| | - Ruixin Zhu
- Department of Bioinformatics, Tongji University, Shanghai, 200092 People's Republic of China
| |
Collapse
|
50
|
Hu X, Dong Q, Yang J, Zhang Y. Recognizing metal and acid radical ion-binding sites by integrating ab initio modeling with template-based transferals. ACTA ACUST UNITED AC 2016; 32:3260-3269. [PMID: 27378301 DOI: 10.1093/bioinformatics/btw396] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 06/18/2016] [Indexed: 11/13/2022]
Abstract
MOTIVATION More than half of proteins require binding of metal and acid radical ions for their structure and function. Identification of the ion-binding locations is important for understanding the biological functions of proteins. Due to the small size and high versatility of the metal and acid radical ions, however, computational prediction of their binding sites remains difficult. RESULTS We proposed a new ligand-specific approach devoted to the binding site prediction of 13 metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+) and acid radical ion ligands (CO32-, NO2-, SO42-, PO43-) that are most frequently seen in protein databases. A sequence-based ab initio model is first trained on sequence profiles, where a modified AdaBoost algorithm is extended to balance binding and non-binding residue samples. A composite method IonCom is then developed to combine the ab initio model with multiple threading alignments for further improving the robustness of the binding site predictions. The pipeline was tested using 5-fold cross validations on a comprehensive set of 2,100 non-redundant proteins bound with 3,075 small ion ligands. Significant advantage was demonstrated compared with the state of the art ligand-binding methods including COACH and TargetS for high-accuracy ion-binding site identification. Detailed data analyses show that the major advantage of IonCom lies at the integration of complementary ab initio and template-based components. Ion-specific feature design and binding library selection also contribute to the improvement of small ion ligand binding predictions. AVAILABILITY AND IMPLEMENTATION http://zhanglab.ccmb.med.umich.edu/IonCom CONTACT: hxz@imut.edu.cn or zhng@umich.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiuzhen Hu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Qiwen Dong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, China
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|