1
|
Nana Teukam YG, Kwate Dassi L, Manica M, Probst D, Schwaller P, Laino T. Language models can identify enzymatic binding sites in protein sequences. Comput Struct Biotechnol J 2024; 23:1929-1937. [PMID: 38736695 PMCID: PMC11087710 DOI: 10.1016/j.csbj.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/05/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open
Abstract
Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and have since gained prominence in modeling proteins and chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some of these relationships refer to three-dimensional structural features, raising important questions on the dimensionality of the information encoded within sequential data. Here, we demonstrate that the unsupervised use of a language model architecture to a language representation of bio-catalyzed chemical reactions can capture the signal at the base of the substrate-binding site atomic interactions. This allows us to identify the three-dimensional binding site position in unknown protein sequences. The language representation comprises a reaction-simplified molecular-input line-entry system (SMILES) for substrate and products, and amino acid sequence information for the enzyme. This approach can recover, with no supervision, 52.13% of the binding site when considering co-crystallized substrate-enzyme structures as ground truth, vastly outperforming other attention-based models.
Collapse
Affiliation(s)
| | - Loïc Kwate Dassi
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Matteo Manica
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Daniel Probst
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Philippe Schwaller
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Teodoro Laino
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| |
Collapse
|
2
|
Pourhajibagher M, Javanmard Z, Bahador A. Molecular docking and antimicrobial activities of photoexcited inhibitors in antimicrobial photodynamic therapy against Enterococcus faecalis biofilms in endodontic infections. AMB Express 2024; 14:94. [PMID: 39215887 PMCID: PMC11365891 DOI: 10.1186/s13568-024-01751-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 08/14/2024] [Indexed: 09/04/2024] Open
Abstract
Antimicrobial photodynamic therapy (aPDT) is a promising approach to combat antibiotic resistance in endodontic infections. It eliminates residual bacteria from the root canal space and reduces the need for antibiotics. To enhance its effectiveness, an in silico and in vitro study was performed to investigate the potential of targeted aPDT using natural photosensitizers, Kojic acid and Parietin. This approach aims to inhibit the biofilm formation of Enterococcus faecalis, a frequent cause of endodontic infections, by targeting the Ace and Esp proteins. After determining the physicochemical characteristics of Ace and Esp proteins and model quality assessment, the molecular dynamic simulation was performed to recognize the structural variations. The stability and physical movement of the protein-ligand complexes were evaluated. In silico molecular docking was conducted, followed by ADME/Tox profiling, pharmacokinetics characteristics, and assessment of drug-likeness properties of the natural photosensitizers. The study also investigated the changes in the expression of genes (esp and ace) involved in E. faecalis biofilm formation. The results showed that both Kojic acid and Parietin complied with Lipinski's rule of five and exhibited drug-like properties. In silico analysis indicated stable complexes between Ace and Esp proteins and the natural photosensitizers. The molecular docking studies demonstrated good binding affinity. Additionally, the expression of the ace and esp genes was significantly downregulated in aPDT using Kojic acid and Parietin with blue light compared to the control group. This investigation concluded that Kojic acid and Parietin with drug-likeness could efficiently interact with Ace and Esp proteins with a strong binding affinity. Hence, natural photosensitizers-mediated aPDT can be considered a promising adjunctive treatment against endodontic infections.
Collapse
Affiliation(s)
- Maryam Pourhajibagher
- Dental Research Center, Dentistry Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Zahra Javanmard
- Department of Microbiology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Abbas Bahador
- Department of Microbiology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
- Fellowship in Clinical Laboratory Sciences, BioHealth Lab, Tehran, Iran.
| |
Collapse
|
3
|
Singh K, Malik YS. ANN based prediction of ligand binding sites outside deep cavities to facilitate drug designing. Curr Res Struct Biol 2024; 7:100144. [PMID: 38681239 PMCID: PMC11047793 DOI: 10.1016/j.crstbi.2024.100144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 05/01/2024] Open
Abstract
The ever-changing environmental conditions and pollution are the prime reasons for the onset of several emerging and re-merging diseases. This demands the faster designing of new drugs to curb the deadly diseases in less waiting time to cure the animals and humans. Drug molecules interact with only protein surface on specific locations termed as ligand binding sites (LBS). Therefore, the knowledge of LBS is required for rational drug designing. Existing geometrical LBS prediction methods rely on search of cavities based on the fact that 83% of the LBS found in deep cavities, however, these methods usually fail where LBS localize outside deep cavities. To overcome this challenge, the present work provides an artificial neural network (ANN) based method to predict LBS outside deep cavities in animal proteins including human to facilitate drug designing. In the present work a feed-forward backpropagation neural network was trained by utilizing 38 structural, atomic, physiochemical, and evolutionary discriminant features of LBS and non-LBS residues localized in the extracted roughest patch on protein surface. The performance of this ANN based prediction method was found 76% better for those proteins where cavity subspace (extracted by MetaPocket 2.0, a consensus method) failed to predict LBS due to their localization outside the deep cavities. The prediction of LBS outside deep cavities will facilitate in drug designing for the proteins where it is not possible due to lack of LBS information as the geometrical LBS prediction methods rely on extraction of deep cavities.
Collapse
Affiliation(s)
- Kalpana Singh
- College of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University, Ludhiana-141004, India
| | - Yashpal Singh Malik
- College of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University, Ludhiana-141004, India
| |
Collapse
|
4
|
Sarkar M, Saha S. Modeling of SARS-CoV-2 Virus Proteins: Implications on Its Proteome. Methods Mol Biol 2023; 2627:265-299. [PMID: 36959453 DOI: 10.1007/978-1-0716-2974-1_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
COronaVIrus Disease 19 (COVID-19) is a severe acute respiratory syndrome (SARS) caused by a group of beta coronaviruses, SARS-CoV-2. The SARS-CoV-2 virus is similar to previous SARS- and MERS-causing strains and has infected nearly six hundred and fifty million people all over the globe, while the death toll has crossed the six million mark (as of December, 2022). In this chapter, we look at how computational modeling approaches of the viral proteins could help us understand the various processes in the viral life cycle inside the host, an understanding of which might provide key insights in mitigating this and future threats. This understanding helps us identify key targets for the purpose of drug discovery and vaccine development.
Collapse
Affiliation(s)
- Manish Sarkar
- Hochschule für Technik und Wirtschaft (HTW) Berlin, Berlin, Germany
- MedInsights SAS, Paris, France
| | - Soham Saha
- MedInsights, Veuilly la Poterie, France.
- MedInsights SAS, Paris, France.
| |
Collapse
|
5
|
Mendoza Rengifo E, Stelmastchuk Benassi Fontolan L, Ribamar Ferreira-Junior J, Bleicher L, Penner-Hahn J, Charles Garratt R. UNEXPECTED PLASTICITY OF THE QUATERNARY STRUCTURE OF IRON-MANGANESE SUPEROXIDE DISMUTASES. J Struct Biol 2022; 214:107855. [PMID: 35390463 DOI: 10.1016/j.jsb.2022.107855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/08/2022] [Accepted: 04/01/2022] [Indexed: 10/18/2022]
Abstract
Protein 3D structure can be remarkably robust to the accumulation of mutations during evolution. On the other hand, sometimes a single amino acid substitution can be sufficient to generate dramatic and completely unpredictable structural consequences. In an attempt to rationally alter the preferences for the metal ion at the active site of a member of the Iron/Manganese superoxide dismutase family, two examples of the latter phenomenon were identified. Site directed mutants of SOD from Trichoderma reesei were generated and studied crystallographically together with the wild type enzyme. Despite being chosen for their potential impact on the redox potential of the metal, two of the mutations (D150G and G73A) in fact resulted in significant alterations to the protein quaternary structure. The D150G mutant presented alternative inter-subunit contacts leading to a loss of symmetry of the wild type tetramer, whereas the G73A mutation transformed the tetramer into an octamer despite not participating directly in any of the inter-subunit interfaces. We conclude that there is considerable intrinsic plasticity in the Fe/MnSOD fold that can be unpredictably affected by single amino acid substitutions. In much the same way as phenotypic defects at the organism level can reveal much about normal function, so too can such mutations teach us much about the subtleties of protein structure.
Collapse
Affiliation(s)
- Emerita Mendoza Rengifo
- Laboratory of Structural Biology, Sao Carlos Institute of Physics, University of Sao Paulo, Sao Carlos, Sao Paulo, Brazil
| | | | - Jose Ribamar Ferreira-Junior
- Laboratory of Biotechnology, School of Arts, Sciences and Humanities, University of Sao Paulo, Sao Paulo, Brazil
| | - Lucas Bleicher
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - James Penner-Hahn
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, United States
| | - Richard Charles Garratt
- Laboratory of Structural Biology, Sao Carlos Institute of Physics, University of Sao Paulo, Sao Carlos, Sao Paulo, Brazil.
| |
Collapse
|
6
|
Hot spots-making directed evolution easier. Biotechnol Adv 2022; 56:107926. [DOI: 10.1016/j.biotechadv.2022.107926] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/04/2022] [Accepted: 02/07/2022] [Indexed: 01/20/2023]
|
7
|
Kurt F, Filiz E, Aydın A. Genome-wide identification of serine acetyltransferase (SAT) gene family in rice (Oryza sativa) and their expressions under salt stress. Mol Biol Rep 2021; 48:6277-6290. [PMID: 34389920 DOI: 10.1007/s11033-021-06620-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 08/03/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND Assimilation of sulfur to cysteine (Cys) occurs in presence of serine acetyltransferase (SAT). Drought and salt stresses are known to be regulated by abscisic acid, whose biosynthesis is limited by Cys. Cys is formed by cysteine synthase complex depending on SAT and OASTL enzymes. Functions of some SAT genes were identified in Arabidopsis; however, it is not known how SAT genes are regulated in rice (Oryza sativa) under salt stress. METHODS AND RESULTS Sequence, protein domain, gene structure, nucleotide, phylogenetic, selection, gene duplication, motif, synteny, digital expression and co-expression, secondary and tertiary protein structures, and binding site analyses were conducted. The wet-lab expressions of OsSAT genes were also tested under salt stress. OsSATs have underwent purifying selection. Segmental and tandem duplications may be driving force of structural and functional divergences of OsSATs. The digital expression analyses of OsSATs showed that jasmonic acid (JA) was the only hormone inducing the expressions of OsSAT1;1, OsSAT2;1, and OsSAT2;2 whereas auxin and ABA only triggered OsSAT1;1 expression. Leaf blade is the only plant organ where all OsSATs but OsSAT1;1 were expressed. Wet-lab expressions of OsSATs indicated that OsSAT1;1, OsSAT1;2 and OsSAT1;3 genes were upregulated at different exposure times of salt stress. CONCLUSIONS OsSAT1;1, expressed highly in rice roots, may be a hub gene regulated by cross-talk of JA, ABA and auxin hormones. The cross-talk of the mentioned hormones and the structural variations of OsSAT proteins may also explain the different responses of OsSATs to salt stress.
Collapse
Affiliation(s)
- Fırat Kurt
- Department of Plant Production and Technologies, Faculty of Applied Sciences, Mus Alparslan University, Mus, Turkey
| | - Ertugrul Filiz
- Department of Crop and Animal Production, Cilimli Vocational School, Duzce University, Cilimli, Duzce, Turkey.
| | - Adnan Aydın
- Department of Agricultural Biotechnology, Faculty of Agriculture, Iğdır University, Iğdır, Turkey
| |
Collapse
|
8
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
9
|
Wood DJ, Lopez-Fernandez JD, Knight LE, Al-Khawaldeh I, Gai C, Lin S, Martin MP, Miller DC, Cano C, Endicott JA, Hardcastle IR, Noble MEM, Waring MJ. FragLites-Minimal, Halogenated Fragments Displaying Pharmacophore Doublets. An Efficient Approach to Druggability Assessment and Hit Generation. J Med Chem 2019; 62:3741-3752. [PMID: 30860382 DOI: 10.1021/acs.jmedchem.9b00304] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Identifying ligand binding sites on proteins is a critical step in target-based drug discovery. Current approaches to this require resource-intensive screening of large libraries of lead-like or fragment molecules. Here, we describe an efficient and effective experimental approach to mapping interaction sites using a set of halogenated compounds expressing paired hydrogen-bonding motifs, termed FragLites. The FragLites identify productive drug-like interactions, which are identified sensitively and unambiguously by X-ray crystallography, exploiting the anomalous scattering of the halogen substituent. This mapping of protein interaction surfaces provides an assessment of druggability and can identify efficient start points for the de novo design of hit molecules incorporating the interacting motifs. The approach is illustrated by mapping cyclin-dependent kinase 2, which successfully identifies orthosteric and allosteric sites. The hits were rapidly elaborated to develop efficient lead-like molecules. Hence, the approach provides a new method of identifying ligand sites, assessing tractability and discovering new leads.
Collapse
Affiliation(s)
- Daniel J Wood
- Northern Institute for Cancer Research, Medical School , Newcastle University , Paul O'Gorman Building, Framlington Place , Newcastle upon Tyne NE2 4HH , U.K
| | - J Daniel Lopez-Fernandez
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Leanne E Knight
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Islam Al-Khawaldeh
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Conghao Gai
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Shengying Lin
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Mathew P Martin
- Northern Institute for Cancer Research, Medical School , Newcastle University , Paul O'Gorman Building, Framlington Place , Newcastle upon Tyne NE2 4HH , U.K
| | - Duncan C Miller
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Céline Cano
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Jane A Endicott
- Northern Institute for Cancer Research, Medical School , Newcastle University , Paul O'Gorman Building, Framlington Place , Newcastle upon Tyne NE2 4HH , U.K
| | - Ian R Hardcastle
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| | - Martin E M Noble
- Northern Institute for Cancer Research, Medical School , Newcastle University , Paul O'Gorman Building, Framlington Place , Newcastle upon Tyne NE2 4HH , U.K
| | - Michael J Waring
- Northern Institute for Cancer Research, Chemistry, School of Natural and Environmental Sciences , Newcastle University , Bedson Building , Newcastle upon Tyne NE1 7RU , U.K
| |
Collapse
|
10
|
Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics 2019; 35:12-19. [PMID: 29947739 PMCID: PMC6298051 DOI: 10.1093/bioinformatics/bty523] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/20/2018] [Accepted: 06/26/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs). Results The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
11
|
Thirumal Kumar D, Umer Niazullah M, Tasneem S, Judith E, Susmita B, George Priya Doss C, Selvarajan E, Zayed H. A computational method to characterize the missense mutations in the catalytic domain of GAA protein causing Pompe disease. J Cell Biochem 2018; 120:3491-3505. [DOI: 10.1002/jcb.27624] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 08/14/2018] [Indexed: 12/12/2022]
Affiliation(s)
- D Thirumal Kumar
- Department of Integrative Biology School of Bio Sciences and Technology, Vellore Institute of Technology Vellore Tamil Nadu India
| | - Maryam Umer Niazullah
- Department of Biomedical Sciences College of Health and Sciences, Qatar University Doha Qatar
| | - Sadia Tasneem
- Department of Biomedical Sciences College of Health and Sciences, Qatar University Doha Qatar
| | - E Judith
- Department of Integrative Biology School of Bio Sciences and Technology, Vellore Institute of Technology Vellore Tamil Nadu India
| | - B Susmita
- Department of Integrative Biology School of Bio Sciences and Technology, Vellore Institute of Technology Vellore Tamil Nadu India
| | - C George Priya Doss
- Department of Integrative Biology School of Bio Sciences and Technology, Vellore Institute of Technology Vellore Tamil Nadu India
| | - E Selvarajan
- Department of Genetic engineering School of Bioengineering, SRM Institute of Science and Technology Kattankulathur Chennai India
| | - Hatem Zayed
- Department of Biomedical Sciences College of Health and Sciences, Qatar University Doha Qatar
| |
Collapse
|
12
|
Castilla IA, Woods DF, Reen FJ, O'Gara F. Harnessing Marine Biocatalytic Reservoirs for Green Chemistry Applications through Metagenomic Technologies. Mar Drugs 2018; 16:E227. [PMID: 29973493 PMCID: PMC6071119 DOI: 10.3390/md16070227] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 06/13/2018] [Accepted: 06/22/2018] [Indexed: 01/24/2023] Open
Abstract
In a demanding commercial world, large-scale chemical processes have been widely utilised to satisfy consumer related needs. Chemical industries are key to promoting economic growth and meeting the requirements of a sustainable industrialised society. The market need for diverse commodities produced by the chemical industry is rapidly expanding globally. Accompanying this demand is an increased threat to the environment and to human health, due to waste produced by increased industrial production. This increased demand has underscored the necessity to increase reaction efficiencies, in order to reduce costs and increase profits. The discovery of novel biocatalysts is a key method aimed at combating these difficulties. Metagenomic technology, as a tool for uncovering novel biocatalysts, has great potential and applicability and has already delivered many successful achievements. In this review we discuss, recent developments and achievements in the field of biocatalysis. We highlight how green chemistry principles through the application of biocatalysis, can be successfully promoted and implemented in various industrial sectors. In addition, we demonstrate how two novel lipases/esterases were mined from the marine environment by metagenomic analysis. Collectively these improvements can result in increased efficiency, decreased energy consumption, reduced waste and cost savings for the chemical industry.
Collapse
Affiliation(s)
- Ignacio Abreu Castilla
- BIOMERIT Research Centre, School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
| | - David F Woods
- BIOMERIT Research Centre, School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
| | - F Jerry Reen
- School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
| | - Fergal O'Gara
- BIOMERIT Research Centre, School of Microbiology, University College Cork, T12 K8AF Cork, Ireland.
- Telethon Kids Institute, Perth, WA 6008, Australia.
- Human Microbiome Programme, School of Pharmacy and Biomedical Sciences, Curtin Health Innovation Research Institute, Curtin University, Perth, WA 6102, Australia.
| |
Collapse
|
13
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
14
|
Song J, Li F, Takemoto K, Haffari G, Akutsu T, Chou KC, Webb GI. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 2018; 443:125-137. [DOI: 10.1016/j.jtbi.2018.01.023] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 01/17/2018] [Accepted: 01/18/2018] [Indexed: 10/18/2022]
|
15
|
Choudhary P, Kumar S, Bachhawat AK, Pandit SB. CSmetaPred: a consensus method for prediction of catalytic residues. BMC Bioinformatics 2017; 18:583. [PMID: 29273005 PMCID: PMC5741869 DOI: 10.1186/s12859-017-1987-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 12/05/2017] [Indexed: 01/27/2023] Open
Abstract
Background Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Results Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. Conclusions The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/. Electronic supplementary material The online version of this article (10.1186/s12859-017-1987-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Preeti Choudhary
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India
| | - Shailesh Kumar
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.,Laboratory of Biochemistry and Genetics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anand Kumar Bachhawat
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, Knowledge City, Sector 81, SAS Nagar, Manuali PO 140306, India.
| |
Collapse
|
16
|
Abstract
Protein-ligand binding site prediction methods aim to predict, from amino acid sequence, protein-ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein-ligand interactions has become extremely important to help determine a protein's functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein-ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein-ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein-ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.
Collapse
|
17
|
Zang P, Gong A, Zhang P, Yu J. Targeting druggable enzymome by exploiting natural medicines: An in silico-in vitro integrated approach to combating multidrug resistance in bacterial infection. PHARMACEUTICAL BIOLOGY 2015; 54:604-618. [PMID: 26681298 DOI: 10.3109/13880209.2015.1068338] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
CONTEXT Antibiotic resistance is a major clinical and public health problem. Development of new therapeutic approaches to prevent bacterial multidrug resistance during antimicrobial chemotherapy has thus been becoming a primary consideration in the medicinal chemistry community. OBJECTIVE We described a new strategy that combats multidrug resistance by using natural medicines to target the druggable enzymome (i.e., enzymatic proteome) of Staphylococcus aureus. MATERIALS AND METHODS A pipeline of integrating in silico analysis and in vitro assay was purposed to identify antibacterial agents from a large library of natural products with diverse structures, high drug-likeness, and relatively low flexibility, with which a systematic interactome of 826 natural product candidates with 125 functionally essential S. aureus enzymes was constructed via a high-throughput cross-docking approach. The obtained docking score matrix was then converted into an array of synthetic scores; each corresponds to a natural product candidate. By systematically examining the docking results, a number of highly promising candidates with potent antibacterial activity were suggested. RESULTS Three natural products, i.e., radicicol, jorumycin, and amygdalin, have been determined to possess strong broad-spectrum potency combating both the drug-resistant and drug-sensitive strains (MIC value <10 μg/ml). In addition, some natural products such as tetrandrine, bilobalide, and arbutin exhibited selective inhibition on different strains. DISCUSSION AND CONCLUSION Combined quantum mechanics/molecular mechanics analysis revealed diverse non-bonded interactions across the complex interfaces of newly identified antibacterial agents with their putative targets GyrB ATPase and tyrosyl-tRNA synthetase.
Collapse
Affiliation(s)
- Ping Zang
- a Department of Public Health Management , The Affiliated Hospital of Weifang Medical University , Weifang , China
| | - Aijie Gong
- b Department of Central Sterile Supply , Changyi People's Hospital , Changyi , China
| | | | - Jinling Yu
- d Department of Gynaecology , The Affiliated Hospital of Weifang Medical University , Weifang , China
| |
Collapse
|
18
|
Roche DB, Brackenridge DA, McGuffin LJ. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods. Int J Mol Sci 2015; 16:29829-42. [PMID: 26694353 PMCID: PMC4691145 DOI: 10.3390/ijms161226202] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 12/02/2015] [Accepted: 12/10/2015] [Indexed: 01/14/2023] Open
Abstract
Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.
Collapse
Affiliation(s)
- Daniel Barry Roche
- Institut de Biologie Computationnelle, LIRMM, CNRS, Université de Montpellier, Montpellier 34095, France.
- Centre de Recherche de Biochimie Macromoléculaire, CNRS-UMR 5237, Montpellier 34293, France.
| | | | | |
Collapse
|
19
|
Aubailly S, Piazza F. Cutoff lensing: predicting catalytic sites in enzymes. Sci Rep 2015; 5:14874. [PMID: 26445900 PMCID: PMC4597221 DOI: 10.1038/srep14874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 09/10/2015] [Indexed: 01/12/2023] Open
Abstract
Predicting function-related amino acids in proteins with unknown function or unknown allosteric binding sites in drug-targeted proteins is a task of paramount importance in molecular biomedicine. In this paper we introduce a simple, light and computationally inexpensive structure-based method to identify catalytic sites in enzymes. Our method, termed cutoff lensing, is a general procedure consisting in letting the cutoff used to build an elastic network model increase to large values. A validation of our method against a large database of annotated enzymes shows that optimal values of the cutoff exist such that three different structure-based indicators allow one to recover a maximum of the known catalytic sites. Interestingly, we find that the larger the structures the greater the predictive power afforded by our method. Possible ways to combine the three indicators into a single figure of merit and into a specific sequential analysis are suggested and discussed with reference to the classic case of HIV-protease. Our method could be used as a complement to other sequence- and/or structure-based methods to narrow the results of large-scale screenings.
Collapse
Affiliation(s)
- Simon Aubailly
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| | - Francesco Piazza
- Université d'Orléans, Centre de Biophysique Moléculaire, CNRS-UPR4301, Rue C. Sadron, 45071, Orléans, France
| |
Collapse
|
20
|
PINGU: PredIction of eNzyme catalytic residues usinG seqUence information. PLoS One 2015; 10:e0135122. [PMID: 26261982 PMCID: PMC4532418 DOI: 10.1371/journal.pone.0135122] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 07/17/2015] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues can help unveil interesting attributes of enzyme function for various therapeutic and industrial applications. Based on their biochemical roles, the number of catalytic residues and sequence lengths of enzymes vary. This article describes a prediction approach (PINGU) for such a scenario. It uses models trained using physicochemical properties and evolutionary information of 650 non-redundant enzymes (2136 catalytic residues) in a support vector machines architecture. Independent testing on 200 non-redundant enzymes (683 catalytic residues) in predefined prediction settings, i.e., with non-catalytic per catalytic residue ranging from 1 to 30, suggested that the prediction approach was highly sensitive and specific, i.e., 80% or above, over the incremental challenges. To learn more about the discriminatory power of PINGU in real scenarios, where the prediction challenge is variable and susceptible to high false positives, the best model from independent testing was used on 60 diverse enzymes. Results suggested that PINGU was able to identify most catalytic residues and non-catalytic residues properly with 80% or above accuracy, sensitivity and specificity. The effect of false positives on precision was addressed in this study by application of predicted ligand-binding residue information as a post-processing filter. An overall improvement of 20% in F-measure and 0.138 in Correlation Coefficient with 16% enhanced precision could be achieved. On account of its encouraging performance, PINGU is hoped to have eventual applications in boosting enzyme engineering and novel drug discovery.
Collapse
|
21
|
Fang C, Noguchi T, Yamana H. Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites. J Bioinform Comput Biol 2015; 12:1440003. [PMID: 25362840 DOI: 10.1142/s0219720014400034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.
Collapse
Affiliation(s)
- Chun Fang
- Department of Computer Science and Engineering of Shandong, University of Technology, Shandong 255049, P. R. China
| | | | | |
Collapse
|
22
|
Abstract
Faced with a protein engineering challenge, a contemporary researcher can choose from myriad design strategies. Library-scale computational protein design (LCPD) is a hybrid method suitable for the engineering of improved protein variants with diverse sequences. This chapter discusses the background and merits of several practical LCPD techniques. First, LCPD methods suitable for delocalized protein design are presented in the context of example design calculations for cellobiohydrolase II. Second, localized design methods are discussed in the context of an example design calculation intended to shift the substrate specificity of a ketol-acid reductoisomerase Rossmann domain from NADPH to NADH.
Collapse
|
23
|
Mills CL, Beuning PJ, Ondrechen MJ. Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 2015; 13:182-91. [PMID: 25848497 PMCID: PMC4372640 DOI: 10.1016/j.csbj.2015.02.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 02/06/2015] [Accepted: 02/11/2015] [Indexed: 01/07/2023] Open
Abstract
With the exponential growth in the determination of protein sequences and structures via genome sequencing and structural genomics efforts, there is a growing need for reliable computational methods to determine the biochemical function of these proteins. This paper reviews the efforts to address the challenge of annotating the function at the molecular level of uncharacterized proteins. While sequence- and three-dimensional-structure-based methods for protein function prediction have been reviewed previously, the recent trends in local structure-based methods have received less attention. These local structure-based methods are the primary focus of this review. Computational methods have been developed to predict the residues important for catalysis and the local spatial arrangements of these residues can be used to identify protein function. In addition, the combination of different types of methods can help obtain more information and better predictions of function for proteins of unknown function. Global initiatives, including the Enzyme Function Initiative (EFI), COMputational BRidges to EXperiments (COMBREX), and the Critical Assessment of Function Annotation (CAFA), are evaluating and testing the different approaches to predicting the function of proteins of unknown function. These initiatives and global collaborations will increase the capability and reliability of methods to predict biochemical function computationally and will add substantial value to the current volume of structural genomics data by reducing the number of absent or inaccurate functional annotations.
Collapse
Affiliation(s)
- Caitlyn L Mills
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| |
Collapse
|
24
|
Structure and dynamics studies of sterol 24-C-methyltransferase with mechanism based inactivators for the disruption of ergosterol biosynthesis. Mol Biol Rep 2014; 41:4279-93. [DOI: 10.1007/s11033-014-3299-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 02/13/2014] [Indexed: 11/25/2022]
|
25
|
Janda JO, Meier A, Merkl R. CLIPS-4D: a classifier that distinguishes structurally and functionally important residue-positions based on sequence and 3D data. ACTA ACUST UNITED AC 2013; 29:3029-35. [PMID: 24048358 DOI: 10.1093/bioinformatics/btt519] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The precise identification of functionally and structurally important residues of a protein is still an open problem, and state-of-the-art classifiers predict only one or at most two different categories. RESULT We have implemented the classifier CLIPS-4D, which predicts in a mutually exclusively manner a role in catalysis, ligand-binding or protein stability for each residue-position of a protein. Each prediction is assigned a P-value, which enables the statistical assessment and the selection of predictions with similar quality. CLIPS-4D requires as input a multiple sequence alignment and a 3D structure of one protein in PDB format. A comparison with existing methods confirmed state-of-the-art prediction quality, even though CLIPS-4D classifies more specifically than other methods. CLIPS-4D was implemented as a multiclass support vector machine, which exploits seven sequence-based and two structure-based features, each of which was shown to contribute to classification quality. The classification of ligand-binding sites profited most from the 3D features, which were the assessment of the solvent accessible surface area and the identification of surface pockets. In contrast, five additionally tested 3D features did not increase the classification performance achieved with evolutionary signals deduced from the multiple sequence alignment.
Collapse
Affiliation(s)
- Jan-Oliver Janda
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany and Faculty of Mathematics and Computer Science, University of Hagen, D-58084 Hagen, Germany
| | | | | |
Collapse
|
26
|
Hecht M, Bromberg Y, Rost B. News from the protein mutability landscape. J Mol Biol 2013; 425:3937-48. [PMID: 23896297 DOI: 10.1016/j.jmb.2013.07.028] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 07/08/2013] [Accepted: 07/19/2013] [Indexed: 12/16/2022]
Abstract
Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype-phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants.
Collapse
Affiliation(s)
- Maximilian Hecht
- Department of Bioinformatics and Computational Biology I12, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
| | | | | |
Collapse
|
27
|
Zhu Y, Zhou W, Dai DQ, Yan H. Identification of DNA-binding and protein-binding proteins using enhanced graph wavelet features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1017-1031. [PMID: 24334394 DOI: 10.1109/tcbb.2013.117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Interactions between biomolecules play an essential role in various biological processes. For predicting DNA-binding or protein-binding proteins, many machine-learning-based techniques have used various types of features to represent the interface of the complexes, but they only deal with the properties of a single atom in the interface and do not take into account the information of neighborhood atoms directly. This paper proposes a new feature representation method for biomolecular interfaces based on the theory of graph wavelet. The enhanced graph wavelet features (EGWF) provides an effective way to characterize interface feature through adding physicochemical features and exploiting a graph wavelet formulation. Particularly, graph wavelet condenses the information around the center atom, and thus enhances the discrimination of features of biomolecule binding proteins in the feature space. Experiment results show that EGWF performs effectively for predicting DNA-binding and protein-binding proteins in terms of Matthew's correlation coefficient (MCC) score and the area value under the receiver operating characteristic curve (AUC).
Collapse
Affiliation(s)
- Yuan Zhu
- Guangdong University of Finance and Economics, Guangzhou and Sun Yat-Sen University, Guangzhou
| | | | | | - Hong Yan
- City University of Hong Kong, Hong Kong and University of Sydney, Sydney
| |
Collapse
|
28
|
Kirshner DA, Nilmeier JP, Lightstone FC. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB. Nucleic Acids Res 2013; 41:W256-65. [PMID: 23680785 PMCID: PMC3692059 DOI: 10.1093/nar/gkt403] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.
Collapse
Affiliation(s)
| | | | - Felice C. Lightstone
- *To whom correspondence should be addressed. Tel: +1 925 423 8657; Fax: +1 925 423 0785;
| |
Collapse
|
29
|
Nilmeier JP, Kirshner DA, Wong SE, Lightstone FC. Rapid catalytic template searching as an enzyme function prediction procedure. PLoS One 2013; 8:e62535. [PMID: 23675414 PMCID: PMC3651201 DOI: 10.1371/journal.pone.0062535] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 03/22/2013] [Indexed: 11/18/2022] Open
Abstract
We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues--The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches.
Collapse
Affiliation(s)
- Jerome P. Nilmeier
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Daniel A. Kirshner
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Sergio E. Wong
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Felice C. Lightstone
- Biosciences and Biotechnology Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| |
Collapse
|
30
|
Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.10.012] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
31
|
Dutta T, Banerjee S, Soren D, Lahiri S, Sengupta S, Rasquinha JA, Ghosh AK. Regulation of Enzymatic Activity by Deamidation and Their Subsequent Repair by Protein l-isoaspartyl Methyl Transferase. Appl Biochem Biotechnol 2012; 168:2358-75. [DOI: 10.1007/s12010-012-9942-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 10/05/2012] [Indexed: 01/19/2023]
|
32
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
33
|
Zhang YN, Yu DJ, Li SS, Fan YX, Huang Y, Shen HB. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics 2012; 13:118. [PMID: 22651691 PMCID: PMC3424114 DOI: 10.1186/1471-2105-13-118] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 05/31/2012] [Indexed: 12/23/2022] Open
Abstract
Background Adenosine-5′-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex. Results In this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary. Conclusions Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
Collapse
Affiliation(s)
- Ya-Nan Zhang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | | | | | | | | | | |
Collapse
|
34
|
FunFOLDQA: a quality assessment tool for protein-ligand binding site residue predictions. PLoS One 2012; 7:e38219. [PMID: 22666491 PMCID: PMC3364224 DOI: 10.1371/journal.pone.0038219] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Accepted: 05/01/2012] [Indexed: 11/19/2022] Open
Abstract
The estimation of prediction quality is important because without quality measures, it is difficult to determine the usefulness of a prediction. Currently, methods for ligand binding site residue predictions are assessed in the function prediction category of the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiment, utilizing the Matthews Correlation Coefficient (MCC) and Binding-site Distance Test (BDT) metrics. However, the assessment of ligand binding site predictions using such metrics requires the availability of solved structures with bound ligands. Thus, we have developed a ligand binding site quality assessment tool, FunFOLDQA, which utilizes protein feature analysis to predict ligand binding site quality prior to the experimental solution of the protein structures and their ligand interactions. The FunFOLDQA feature scores were combined using: simple linear combinations, multiple linear regression and a neural network. The neural network produced significantly better results for correlations to both the MCC and BDT scores, according to Kendall’s τ, Spearman’s ρ and Pearson’s r correlation coefficients, when tested on both the CASP8 and CASP9 datasets. The neural network also produced the largest Area Under the Curve score (AUC) when Receiver Operator Characteristic (ROC) analysis was undertaken for the CASP8 dataset. Furthermore, the FunFOLDQA algorithm incorporating the neural network, is shown to add value to FunFOLD, when both methods are employed in combination. This results in a statistically significant improvement over all of the best server methods, the FunFOLD method (6.43%), and one of the top manual groups (FN293) tested on the CASP8 dataset. The FunFOLDQA method was also found to be competitive with the top server methods when tested on the CASP9 dataset. To the best of our knowledge, FunFOLDQA is the first attempt to develop a method that can be used to assess ligand binding site prediction quality, in the absence of experimental data.
Collapse
|
35
|
Dou Y, Wang J, Yang J, Zhang C. L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 2012; 7:e35666. [PMID: 22558194 PMCID: PMC3338704 DOI: 10.1371/journal.pone.0035666] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 03/19/2012] [Indexed: 12/01/2022] Open
Abstract
To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, Nebraska, United States of America
| | - Jun Wang
- Scientific Computing Key Laboratory of Shanghai Universities, Shanghai, People’s Republic of China
- Department of Mathematics, Shanghai Normal University, Shanghai, People’s Republic of China
| | - Jialiang Yang
- MPI-Institute of Computational Biology, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, Nebraska, United States of America
- * E-mail:
| |
Collapse
|
36
|
Shen YQ, Bonnot F, Imsand EM, RoseFigura JM, Sjölander K, Klinman JP. Distribution and properties of the genes encoding the biosynthesis of the bacterial cofactor, pyrroloquinoline quinone. Biochemistry 2012; 51:2265-75. [PMID: 22324760 DOI: 10.1021/bi201763d] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Pyrroloquinoline quinone (PQQ) is a small, redox active molecule that serves as a cofactor for several bacterial dehydrogenases, introducing pathways for carbon utilization that confer a growth advantage. Early studies had implicated a ribosomally translated peptide as the substrate for PQQ production. This study presents a sequence- and structure-based analysis of the components of the pqq operon. We find the necessary components for PQQ production are present in 126 prokaryotes, most of which are Gram-negative and a number of which are pathogens. A total of five gene products, PqqA, PqqB, PqqC, PqqD, and PqqE, are identified as being obligatory for PQQ production. Three of the gene products in the pqq operon, PqqB, PqqC, and PqqE, are members of large protein superfamilies. By combining evolutionary conservation patterns with information from three-dimensional structures, we are able to differentiate the gene products involved in PQQ biosynthesis from those with divergent functions. The observed persistence of a conserved gene order within analyzed operons strongly suggests a role for protein-protein interactions in the course of cofactor biosynthesis. These studies propose previously unidentified roles for several of the gene products, as well as identifying possible new targets for antibiotic design and application.
Collapse
Affiliation(s)
- Yao-Qing Shen
- Department of Chemistry, University of California, Berkeley, California 94720, United States
| | | | | | | | | | | |
Collapse
|
37
|
Chakraborty S, Minda R, Salaye L, Bhattacharjee SK, Rao BJ. Active site detection by spatial conformity and electrostatic analysis--unravelling a proteolytic function in shrimp alkaline phosphatase. PLoS One 2011; 6:e28470. [PMID: 22174814 PMCID: PMC3234256 DOI: 10.1371/journal.pone.0028470] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/08/2011] [Indexed: 11/30/2022] Open
Abstract
Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro.
Collapse
Affiliation(s)
- Sandeep Chakraborty
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, India
| | | | | | | | | |
Collapse
|
38
|
Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011; 27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. RESULTS We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions. AVAILABILITY http://rogerlab.biochem.dal.ca/Software CONTACT andrew.roger@dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Gaston
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5
| | | | | |
Collapse
|
39
|
Sjölander K, Datta RS, Shen Y, Shoffner GM. Ortholog identification in the presence of domain architecture rearrangement. Brief Bioinform 2011; 12:413-22. [PMID: 21712343 PMCID: PMC3178056 DOI: 10.1093/bib/bbr036] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.
Collapse
Affiliation(s)
- Kimmen Sjölander
- 308C Stanley Hall #1762, Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
40
|
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 2011; 12:151. [PMID: 21569468 PMCID: PMC3113940 DOI: 10.1186/1471-2105-12-151] [Citation(s) in RCA: 410] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/13/2011] [Indexed: 12/31/2022] Open
Abstract
Background The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. Results PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. Conclusion The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av, Fr, Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
41
|
Dou Y, Geng X, Gao H, Yang J, Zheng X, Wang J. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 2011; 30:229-39. [PMID: 21465136 DOI: 10.1007/s10930-011-9324-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
42
|
Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011; 6:e16932. [PMID: 21468322 PMCID: PMC3066176 DOI: 10.1371/journal.pone.0016932] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open
Abstract
Owing to their potential for systematic analysis, complex networks have been
widely used in proteomics. Representing a protein structure as a topology
network provides novel insight into understanding protein folding mechanisms,
stability and function. Here, we develop a new feature to reveal
correlations between residues using a protein structure network. In an original
attempt to quantify the effects of several key residues on catalytic residues, a
power function was used to model interactions between residues. The results
indicate that focusing on a few residues is a feasible approach to identifying
catalytic residues. The spatial environment surrounding a catalytic residue was
analyzed in a layered manner. We present evidence that correlation between
residues is related to their distance apart most environmental parameters of the
outer layer make a smaller contribution to prediction and ii catalytic residues
tend to be located near key positions in enzyme folds. Feature analysis revealed
satisfactory performance for our features, which were combined with several
conventional features in a prediction model for catalytic residues using a
comprehensive data set from the Catalytic Site Atlas. Values of 88.6 for
sensitivity and 88.4 for specificity were obtained by 10fold crossvalidation.
These results suggest that these features reveal the mutual dependence of
residues and are promising for further study of structurefunction
relationship.
Collapse
|
43
|
Somarowthu S, Yang H, Hildebrand DG, Ondrechen MJ. High-performance prediction of functional residues in proteins with machine learning and computed input features. Biopolymers 2011; 95:390-400. [DOI: 10.1002/bip.21589] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
44
|
Suplatov D, Arzhanik V, Švedas V. Comparative Bioinformatic Analysis of Active Site Structures in Evolutionarily Remote Homologues of α,β-Hydrolase Superfamily Enzymes. Acta Naturae 2011; 3:93-8. [PMID: 22649677 PMCID: PMC3347592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Comparative bioinformatic analysis is the cornerstone of the study of enzymes' structure-function relationship. However, numerous enzymes that derive from a common ancestor and have undergone substantial functional alterations during natural selection appear not to have a sequence similarity acceptable for a statistically reliable comparative analysis. At the same time, their active site structures, in general, can be conserved, while other parts may largely differ. Therefore, it sounds both plausible and appealing to implement a comparative analysis of the most functionally important structural elements - the active site structures; that is, the amino acid residues involved in substrate binding and the catalytic mechanism. A computer algorithm has been developed to create a library of enzyme active site structures based on the use of the PDB database, together with programs of structural analysis and identification of functionally important amino acid residues and cavities in the enzyme structure. The proposed methodology has been used to compare some α,β-hydrolase superfamily enzymes. The insight has revealed a high structural similarity of catalytic site areas, including the conservative organization of a catalytic triad and oxyanion hole residues, despite the wide functional diversity among the remote homologues compared. The methodology can be used to compare the structural organization of the catalytic and substrate binding sites of various classes of enzymes, as well as study enzymes' evolution and to create of a databank of enzyme active site structures.
Collapse
Affiliation(s)
- D.A. Suplatov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University
- Belozersky Institute of Physicochemical Biology, Lomonosov Moscow State University
| | - V.K. Arzhanik
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University
| | - V.K. Švedas
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University
- Belozersky Institute of Physicochemical Biology, Lomonosov Moscow State University
| |
Collapse
|
45
|
Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010; 6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Collapse
|
46
|
Aniba MR, Poch O, Thompson JD. Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res 2010; 38:7353-63. [PMID: 20639539 PMCID: PMC2995051 DOI: 10.1093/nar/gkq625] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Revised: 06/10/2010] [Accepted: 06/29/2010] [Indexed: 11/13/2022] Open
Abstract
The post-genomic era presents many new challenges for the field of bioinformatics. Novel computational approaches are now being developed to handle the large, complex and noisy datasets produced by high throughput technologies. Objective evaluation of these methods is essential (i) to assure high quality, (ii) to identify strong and weak points of the algorithms, (iii) to measure the improvements introduced by new methods and (iv) to enable non-specialists to choose an appropriate tool. Here, we discuss the development of formal benchmarks, designed to represent the current problems encountered in the bioinformatics field. We consider several criteria for building good benchmarks and the advantages to be gained when they are used intelligently. To illustrate these principles, we present a more detailed discussion of benchmarks for multiple alignments of protein sequences. As in many other domains, significant progress has been achieved in the multiple alignment field and the datasets have become progressively more challenging as the existing algorithms have evolved. Finally, we propose directions for future developments that will ensure that the bioinformatics benchmarks correspond to the challenges posed by the high throughput data.
Collapse
Affiliation(s)
- Mohamed Radhouene Aniba
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Department of Structural Biology and Genomics, Institut National de la Santé et de la Recherche Médicale (INSERM), U596, The Centre National de la Recherche Scientifique (CNRS), UMR7104, F-67400 Illkirch and Université de Strasbourg, F-67000 Strasbourg, France
| | - Olivier Poch
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Department of Structural Biology and Genomics, Institut National de la Santé et de la Recherche Médicale (INSERM), U596, The Centre National de la Recherche Scientifique (CNRS), UMR7104, F-67400 Illkirch and Université de Strasbourg, F-67000 Strasbourg, France
| | - Julie D. Thompson
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), Department of Structural Biology and Genomics, Institut National de la Santé et de la Recherche Médicale (INSERM), U596, The Centre National de la Recherche Scientifique (CNRS), UMR7104, F-67400 Illkirch and Université de Strasbourg, F-67000 Strasbourg, France
| |
Collapse
|
47
|
Roche DB, Tetchner SJ, McGuffin LJ. The binding site distance test score: a robust method for the assessment of predicted protein binding sites. ACTA ACUST UNITED AC 2010; 26:2920-1. [PMID: 20861025 DOI: 10.1093/bioinformatics/btq543] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION We propose a novel method for scoring the accuracy of protein binding site predictions-the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community-wide prediction experiment-CASP8. While being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of non-binding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores while also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions. AVAILABILITY http://www.reading.ac.uk/bioinf/downloads/.
Collapse
Affiliation(s)
- Daniel B Roche
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | | | | |
Collapse
|
48
|
Schmidtke P, Barril X. Understanding and Predicting Druggability. A High-Throughput Method for Detection of Drug Binding Sites. J Med Chem 2010; 53:5858-67. [DOI: 10.1021/jm100574m] [Citation(s) in RCA: 223] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Peter Schmidtke
- Departament de Fisicoquímica, Facultat de Farmàcia, Universitat de Barcelona, Av. Joan XXIII s/n, 08028 Barcelona, Spain
| | - Xavier Barril
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Spain
- Departament de Fisicoquímica, Facultat de Farmàcia, Universitat de Barcelona, Av. Joan XXIII s/n, 08028 Barcelona, Spain
| |
Collapse
|
49
|
Huang LT, Gromiha MM. First insight into the prediction of protein folding rate change upon point mutation. Bioinformatics 2010; 26:2121-7. [DOI: 10.1093/bioinformatics/btq350] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
50
|
Hagopian R, Davidson JR, Datta RS, Samad B, Jarvis GR, Sjölander K. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction. Nucleic Acids Res 2010; 38:W29-34. [PMID: 20430824 PMCID: PMC2896197 DOI: 10.1093/nar/gkq298] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2010] [Revised: 03/27/2010] [Accepted: 04/07/2010] [Indexed: 11/29/2022] Open
Abstract
We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
Collapse
Affiliation(s)
- Raffi Hagopian
- Department of Bioengineering, QB3 Institute and Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - John R. Davidson
- Department of Bioengineering, QB3 Institute and Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Ruchira S. Datta
- Department of Bioengineering, QB3 Institute and Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Bushra Samad
- Department of Bioengineering, QB3 Institute and Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Glen R. Jarvis
- Department of Bioengineering, QB3 Institute and Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Kimmen Sjölander
- Department of Bioengineering, QB3 Institute and Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|