1
|
Zhou Y, Myung Y, Rodrigues CHM, Ascher DB. DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning. Nucleic Acids Res 2024:gkae412. [PMID: 38783112 DOI: 10.1093/nar/gkae412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024] Open
Abstract
Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. However, current predictive tools often struggle to balance efficiency and precision in predicting the effects of mutations on these complex interactions. To address this, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust Siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network operated on the protein interaction interface. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.75 (root mean squared error: 1.33 kcal/mol) in our evaluations, outperforming most existing methods. Importantly, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. DDMut-PPI offers a significant advancement in the field and will serve as a valuable tool for researchers probing the complexities of protein interactions. DDMut-PPI is freely available as a web server and an application programming interface at https://biosig.lab.uq.edu.au/ddmut_ppi.
Collapse
Affiliation(s)
- Yunzhuo Zhou
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - YooChan Myung
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Carlos H M Rodrigues
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
| | - David B Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| |
Collapse
|
2
|
Myung Y, de Sá AGC, Ascher DB. Deep-PK: deep learning for small molecule pharmacokinetic and toxicity prediction. Nucleic Acids Res 2024:gkae254. [PMID: 38634808 DOI: 10.1093/nar/gkae254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 03/20/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Evaluating pharmacokinetic properties of small molecules is considered a key feature in most drug development and high-throughput screening processes. Generally, pharmacokinetics, which represent the fate of drugs in the human body, are described from four perspectives: absorption, distribution, metabolism and excretion-all of which are closely related to a fifth perspective, toxicity (ADMET). Since obtaining ADMET data from in vitro, in vivo or pre-clinical stages is time consuming and expensive, many efforts have been made to predict ADMET properties via computational approaches. However, the majority of available methods are limited in their ability to provide pharmacokinetics and toxicity for diverse targets, ensure good overall accuracy, and offer ease of use, interpretability and extensibility for further optimizations. Here, we introduce Deep-PK, a deep learning-based pharmacokinetic and toxicity prediction, analysis and optimization platform. We applied graph neural networks and graph-based signatures as a graph-level feature to yield the best predictive performance across 73 endpoints, including 64 ADMET and 9 general properties. With these powerful models, Deep-PK supports molecular optimization and interpretation, aiding users in optimizing and understanding pharmacokinetics and toxicity for given input molecules. The Deep-PK is freely available at https://biosig.lab.uq.edu.au/deeppk/.
Collapse
Affiliation(s)
- Yoochan Myung
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, The Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
3
|
Wang JL, Liu MS, Fu YD, Kan QB, Li CY, Ma R, Fang ZW, Liu HX, Li MX, Lv JL, Sang P, Zhang C, Li HW. Exploring the conformational dynamics and thermodynamics of EGFR S768I and G719X + S768I mutations in non-small cell lung cancer: An in silico approaches. Open Life Sci 2023; 18:20220768. [PMID: 38035047 PMCID: PMC10685407 DOI: 10.1515/biol-2022-0768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 09/27/2023] [Accepted: 10/05/2023] [Indexed: 12/02/2023] Open
Abstract
Non-small cell lung cancer (NSCLC) is often driven by mutations in the epidermal growth factor receptor (EGFR) gene. However, rare mutations such as G719X and S768I lack standard anti-EGFR targeted therapies. Understanding the structural differences between wild-type EGFR and these rare mutants is crucial for developing EGFR-targeted drugs. We performed a systematic analysis using molecular dynamics simulations, essential dynamics (ED), molecular mechanics Poisson-Boltzmann surface area, and free energy calculation methods to compare the kinetic properties, molecular motion, and free energy distribution between wild-type EGFR and the rare mutants' structures G719X-EGFR, S768I-EGFR, and G719X + S768I-EGFR. Our results showed that S768I-EGFR and G719X + S768I-EGFR have higher global and local conformational flexibility and lower thermal and global structural stability than WT-EGFR. ED analysis revealed different molecular motion patterns between S768I-EGFR, G719X + S768I-EGFR, and WT-EGFR. The A-loop and αC-helix, crucial structural elements related to the active state, showed a tendency toward active state development, providing a molecular mechanism explanation for NSCLC caused by EGFR S768I and EGFR G719C + S768I mutations. The present study may be helpful in the development of new EGFR-targeted drugs based on the structure of rare mutations. Our findings may aid in developing new targeted treatments for patients with EGFR S768I and EGFR G719X + S768I mutations.
Collapse
Affiliation(s)
- Jun-Ling Wang
- Clinical Laboratory, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Ming-Sheng Liu
- Department of Urological Surgery, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Yu-Dong Fu
- Department of Thoracic Surgery, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Qiang-Bo Kan
- Department of Thoracic Surgery, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Chun-Yan Li
- Department of Oncology, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Rong Ma
- Clinical Laboratory, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Zhe-Wei Fang
- Clinical Laboratory, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Hong-Xia Liu
- Clinical Laboratory, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Meng-Xian Li
- Clinical Laboratory, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Jia-Ling Lv
- Department of Oncology, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Peng Sang
- School of Life Science, Dali University, Dali671003, China
| | - Chao Zhang
- Department of Oncology, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| | - Hong-Wei Li
- Clinical Laboratory, Kunming Medical University Affiliated Qujing Hospital, Qujing655000, China
| |
Collapse
|
4
|
Reis DR, Santos BC, Bleicher L, Zárate LE, Nobre CN. Prediction of enzymatic function with high efficiency and a reduced number of features using genetic algorithm. Comput Biol Med 2023; 158:106799. [PMID: 37028140 DOI: 10.1016/j.compbiomed.2023.106799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 02/04/2023] [Accepted: 03/20/2023] [Indexed: 04/07/2023]
Abstract
The post-genomic era has raised a growing demand for efficient procedures to identify protein functions, which can be accomplished by applying machine learning to the characteristics set extracted from the protein. This approach is feature-based and has been the focus of several works in bioinformatics. In this work, we investigated the characteristics of proteins, representing the primary, secondary, tertiary, and quaternary structures of the protein, that improve the model's quality by applying dimensionality reduction techniques and using the Support Vector Machine classifier for predicting the enzymes' classes. During the investigation, two approaches were evaluated: feature extraction/transformation, which was performed using the statistical technique Factor Analysis, and feature selection methods. For feature selection, we proposed an approach based on a genetic algorithm to face the optimization conflict between the simplicity and reliability of an ideal representation of the characteristics of the enzymes and also compared and employed other methods for this purpose. The best result was accomplished using a feature subset generated by our implementation of a multi-objective genetic algorithm enriched with features that this work identified as relevant to represent the enzymes. This subset representation reduced the dataset by about 87% and reached 85.78% of F-measure performance, improving the overall quality of the model classification. In addition, we verified in this work a subset addressed with only 28 features out of a total of 424 that reached a performance above 80% of F-measure for four of the six evaluated classes, showing that satisfactory classification performance can be achieved with a reduced number of enzymes's characteristics. The datasets and implementations are openly available.
Collapse
|
5
|
Aljarf R, Tang S, Pires DEV, Ascher DB. embryoTox: Using Graph-Based Signatures to Predict the Teratogenicity of Small Molecules. J Chem Inf Model 2023; 63:432-441. [PMID: 36595441 DOI: 10.1021/acs.jcim.2c00824] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Teratogenic drugs can lead to extreme fetal malformation and consequently critically influence the fetus's health, yet the teratogenic risks associated with most approved drugs are unknown. Here, we propose a novel predictive tool, embryoTox, which utilizes a graph-based signature representation of the chemical structure of a small molecule to predict and classify molecules likely to be safe during pregnancy. embryoTox was trained and validated using in vitro bioactivity data of over 700 small molecules with characterized teratogenicity effects. Our final model achieved an area under the receiver operating characteristic curve (AUC) of up to 0.96 on 10-fold cross-validation and 0.82 on nonredundant blind tests, outperforming alternative approaches. We believe that our predictive tool will provide a practical resource for optimizing screening libraries to determine effective and safe molecules to use during pregnancy. To provide a simple and integrated platform to rapidly screen for potential safe molecules and their risk factors, we made embryoTox freely available online at https://biosig.lab.uq.edu.au/embryotox/.
Collapse
Affiliation(s)
- Raghad Aljarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Simon Tang
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia
| |
Collapse
|
6
|
Ascher DB, Kaminskas LM, Myung Y, Pires DEV. Using Graph-Based Signatures to Guide Rational Antibody Engineering. Methods Mol Biol 2023; 2552:375-397. [PMID: 36346604 DOI: 10.1007/978-1-0716-2609-2_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Antibodies are essential experimental and diagnostic tools and as biotherapeutics have significantly advanced our ability to treat a range of diseases. With recent innovations in computational tools to guide protein engineering, we can now rationally design better antibodies with improved efficacy, stability, and pharmacokinetics. Here, we describe the use of the mCSM web-based in silico suite, which uses graph-based signatures to rapidly identify the structural and functional consequences of mutations, to guide rational antibody engineering to improve stability, affinity, and specificity.
Collapse
Affiliation(s)
- David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Department of Biochemistry, Cambridge University, Cambridge, UK
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Lisa M Kaminskas
- School of Biological Sciences, University of Queensland, St Lucia, QLD, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- School of Computing and Information Systems, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
7
|
Martins P, Mariano D, Carvalho FC, Bastos LL, Moraes L, Paixão V, Cardoso de Melo-Minardi R. Propedia v2.3: A novel representation approach for the peptide-protein interaction database using graph-based structural signatures. FRONTIERS IN BIOINFORMATICS 2023; 3:1103103. [PMID: 36875148 PMCID: PMC9978205 DOI: 10.3389/fbinf.2023.1103103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 01/30/2023] [Indexed: 02/18/2023] Open
Affiliation(s)
- Pedro Martins
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Diego Mariano
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Frederico Chaves Carvalho
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Luana Luiza Bastos
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Lucas Moraes
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vivian Paixão
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Raquel Cardoso de Melo-Minardi
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
8
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
9
|
Serov N, Vinogradov V. Artificial intelligence to bring nanomedicine to life. Adv Drug Deliv Rev 2022; 184:114194. [PMID: 35283223 DOI: 10.1016/j.addr.2022.114194] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/04/2022] [Accepted: 03/07/2022] [Indexed: 12/13/2022]
Abstract
The technology of drug delivery systems (DDSs) has demonstrated an outstanding performance and effectiveness in production of pharmaceuticals, as it is proved by many FDA-approved nanomedicines that have an enhanced selectivity, manageable drug release kinetics and synergistic therapeutic actions. Nonetheless, to date, the rational design and high-throughput development of nanomaterial-based DDSs for specific purposes is far from a routine practice and is still in its infancy, mainly due to the limitations in scientists' capabilities to effectively acquire, analyze, manage, and comprehend complex and ever-growing sets of experimental data, which is vital to develop DDSs with a set of desired functionalities. At the same time, this task is feasible for the data-driven approaches, high throughput experimentation techniques, process automatization, artificial intelligence (AI) technology, and machine learning (ML) approaches, which is referred to as The Fourth Paradigm of scientific research. Therefore, an integration of these approaches with nanomedicine and nanotechnology can potentially accelerate the rational design and high-throughput development of highly efficient nanoformulated drugs and smart materials with pre-defined functionalities. In this Review, we survey the important results and milestones achieved to date in the application of data science, high throughput, as well as automatization approaches, combined with AI and ML to design and optimize DDSs and related nanomaterials. This manuscript mission is not only to reflect the state-of-art in data-driven nanomedicine, but also show how recent findings in the related fields can transform the nanomedicine's image. We discuss how all these results can be used to boost nanomedicine translation to the clinic, as well as highlight the future directions for the development, data-driven, high throughput experimentation-, and AI-assisted design, as well as the production of nanoformulated drugs and smart materials with pre-defined properties and behavior. This Review will be of high interest to the chemists involved in materials science, nanotechnology, and DDSs development for biomedical applications, although the general nature of the presented approaches enables knowledge translation to many other fields of science.
Collapse
Affiliation(s)
- Nikita Serov
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, Saint-Petersburg 191002, Russian Federation
| | - Vladimir Vinogradov
- International Institute "Solution Chemistry of Advanced Materials and Technologies", ITMO University, Saint-Petersburg 191002, Russian Federation.
| |
Collapse
|
10
|
Pires DEV, Stubbs KA, Mylne JS, Ascher DB. cropCSM: designing safe and potent herbicides with graph-based signatures. Brief Bioinform 2022; 23:6535680. [PMID: 35211724 PMCID: PMC9155605 DOI: 10.1093/bib/bbac042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 12/11/2022] Open
Abstract
Herbicides have revolutionised weed management, increased crop yields and improved profitability allowing for an increase in worldwide food security. Their widespread use, however, has also led to a rise in resistance and concerns about their environmental impact. Despite the need for potent and safe herbicidal molecules, no herbicide with a new mode of action has reached the market in 30 years. Although development of computational approaches has proven invaluable to guide rational drug discovery pipelines, leading to higher hit rates and lower attrition due to poor toxicity, little has been done in contrast for herbicide design. To fill this gap, we have developed cropCSM, a computational platform to help identify new, potent, nontoxic and environmentally safe herbicides. By using a knowledge-based approach, we identified physicochemical properties and substructures enriched in safe herbicides. By representing the small molecules as a graph, we leveraged these insights to guide the development of predictive models trained and tested on the largest collected data set of molecules with experimentally characterised herbicidal profiles to date (over 4500 compounds). In addition, we developed six new environmental and human toxicity predictors, spanning five different species to assist in molecule prioritisation. cropCSM was able to correctly identify 97% of herbicides currently available commercially, while predicting toxicity profiles with accuracies of up to 92%. We believe cropCSM will be an essential tool for the enrichment of screening libraries and to guide the development of potent and safe herbicides. We have made the method freely available through a user-friendly webserver at http://biosig.unimelb.edu.au/crop_csm.
Collapse
Affiliation(s)
- Douglas E V Pires
- School of Computing and Information Systems at the University of Melbourne
| | - Keith A Stubbs
- School of Molecular Sciences at the University of Western Australia
| | - Joshua S Mylne
- Curtin University and Deputy Director of the Centre for Crop and Disease Management
| | - David B Ascher
- University of Queensland, and head of Computational Biology and Clinical Informatics at the Baker Institute and Systems
| |
Collapse
|
11
|
de Castro Barbosa E, Alves TMA, Kohlhoff M, Jangola STG, Pires DEV, Figueiredo ACC, Alves ÉAR, Calzavara-Silva CE, Sobral M, Kroon EG, Rosa LH, Zani CL, de Oliveira JG. Searching for plant-derived antivirals against dengue virus and Zika virus. Virol J 2022; 19:31. [PMID: 35193667 PMCID: PMC8861615 DOI: 10.1186/s12985-022-01751-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 01/23/2022] [Indexed: 12/21/2022] Open
Abstract
Background The worldwide epidemics of diseases as dengue and Zika have triggered an intense effort to repurpose drugs and search for novel antivirals to treat patients as no approved drugs for these diseases are currently available. Our aim was to screen plant-derived extracts to identify and isolate compounds with antiviral properties against dengue virus (DENV) and Zika virus (ZIKV).
Methods Seven thousand plant extracts were screened in vitro for their antiviral properties against DENV-2 and ZIKV by their viral cytopathic effect reduction followed by the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) method, previously validated for this purpose. Selected extracts were submitted to bioactivity-guided fractionation using high- and ultrahigh-pressure liquid chromatography. In parallel, high-resolution mass spectrometric data (MSn) were collected from each fraction, allowing compounds into the active fractions to be tracked in subsequent fractionation procedures. The virucidal activity of extracts and compounds was assessed by using the plaque reduction assay. EC50 and CC50 were determined by dose response experiments, and the ratio (EC50/CC50) was used as a selectivity index (SI) to measure the antiviral vs. cytotoxic activity. Purified compounds were used in nuclear magnetic resonance spectroscopy to identify their chemical structures. Two compounds were associated in different proportions and submitted to bioassays against both viruses to investigate possible synergy. In silico prediction of the pharmacokinetic and toxicity (ADMET) properties of the antiviral compounds were calculated using the pkCSM platform. Results We detected antiviral activity against DENV-2 and ZIKV in 21 extracts obtained from 15 plant species. Hippeastrum (Amaryllidaceae) was the most represented genus, affording seven active extracts. Bioactivity-guided fractionation of several extracts led to the purification of lycorine, pretazettine, narciclasine, and narciclasine-4-O-β-D-xylopyranoside (NXP). Another 16 compounds were identified in active fractions. Association of lycorine and pretazettine did not improve their antiviral activity against DENV-2 and neither to ZIKV. ADMET prediction suggested that these four compounds may have a good metabolism and no mutagenic toxicity. Predicted oral absorption, distribution, and excretion parameters of lycorine and pretazettine indicate them as candidates to be tested in animal models. Conclusions Our results showed that plant extracts, especially those from the Hippeastrum genus, can be a valuable source of antiviral compounds against ZIKV and DENV-2. The majority of compounds identified have never been previously described for their activity against ZIKV and other viruses. Supplementary Information The online version contains supplementary material available at 10.1186/s12985-022-01751-z.
Collapse
Affiliation(s)
- Emerson de Castro Barbosa
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Tânia Maria Almeida Alves
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Markus Kohlhoff
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Soraya Torres Gaze Jangola
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Douglas Eduardo Valente Pires
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, 3052, Australia
| | - Anna Carolina Cançado Figueiredo
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Érica Alessandra Rocha Alves
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Carlos Eduardo Calzavara-Silva
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Marcos Sobral
- Departamento de Ciências Naturais, Universidade Federal de São João del-Rei, Campus Dom Bosco - Praça Dom Helvécio, 74, São João del-Rei, Minas Gerais, 36301-160, Brasil
| | - Erna Geessien Kroon
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Av Antônio Carlos 6627, Belo Horizonte, Minas Gerais, 31270-901, Brasil
| | - Luiz Henrique Rosa
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Av Antônio Carlos 6627, Belo Horizonte, Minas Gerais, 31270-901, Brasil
| | - Carlos Leomar Zani
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil.
| | - Jaquelline Germano de Oliveira
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil.
| |
Collapse
|
12
|
Nguyen TB, Pires DEV, Ascher DB. CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function. Brief Bioinform 2021; 23:6457169. [PMID: 34882232 DOI: 10.1093/bib/bbab512] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/06/2021] [Accepted: 11/08/2021] [Indexed: 12/29/2022] Open
Abstract
Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.,Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
13
|
Rodrigues CHM, Pires DEV, Ascher DB. pdCSM-PPI: Using Graph-Based Signatures to Identify Protein-Protein Interaction Inhibitors. J Chem Inf Model 2021; 61:5438-5445. [PMID: 34719929 DOI: 10.1021/acs.jcim.1c01135] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein-protein interactions are promising sites for development of selective drugs; however, they have generally been viewed as challenging targets. Molecules targeting protein-protein interactions tend to be larger and more lipophilic than other drug-like molecules, mimicking the properties of interacting interfaces. Here, we propose a machine learning approach that uses a graph-based representation of small molecules to guide identification of inhibitors modulating protein-protein interactions, pdCSM-PPI. This approach was applied to 21 different PPI targets. We developed interaction-specific models that were able to accurately identify active compounds achieving MCC and F1 scores up to 1, and Pearson's correlations up to 0.87, outperforming previous approaches. Using insights from these individual models, we developed a generic protein-protein interaction modulator predictive model, which accurately predicted IC50 with a Pearson's correlation of 0.64 on a low redundancy blind test. Importantly, we were able to accurately identify active from inactive compounds, achieving an AUC of 0.77 and sensitivity and specificity of 76% and 78%, respectively. We believe pdCSM-PPI will be an important tool to help guide more efficient screening of new PPI inhibitors; it is freely available as an easy-to-use web server and API at http://biosig.unimelb.edu.au/pdcsm_ppi.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
14
|
da Silva BM, Myung Y, Ascher DB, Pires DEV. epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform 2021; 23:6407730. [PMID: 34676398 DOI: 10.1093/bib/bbab423] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/25/2021] [Accepted: 09/14/2021] [Indexed: 11/13/2022] Open
Abstract
The ability to identify antigenic determinants of pathogens, or epitopes, is fundamental to guide rational vaccine development and immunotherapies, which are particularly relevant for rapid pandemic response. A range of computational tools has been developed over the past two decades to assist in epitope prediction; however, they have presented limited performance and generalization, particularly for the identification of conformational B-cell epitopes. Here, we present epitope3D, a novel scalable machine learning method capable of accurately identifying conformational epitopes trained and evaluated on the largest curated epitope data set to date. Our method uses the concept of graph-based signatures to model epitope and non-epitope regions as graphs and extract distance patterns that are used as evidence to train and test predictive models. We show epitope3D outperforms available alternative approaches, achieving Mathew's Correlation Coefficient and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.
Collapse
Affiliation(s)
- Bruna Moreira da Silva
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - YooChan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
15
|
Rodrigues CHM, Pires DEV, Ascher DB. mmCSM-PPI: predicting the effects of multiple point mutations on protein-protein interactions. Nucleic Acids Res 2021; 49:W417-W424. [PMID: 33893812 PMCID: PMC8262703 DOI: 10.1093/nar/gkab273] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/18/2021] [Accepted: 04/15/2021] [Indexed: 11/16/2022] Open
Abstract
Protein-protein interactions play a crucial role in all cellular functions and biological processes and mutations leading to their disruption are enriched in many diseases. While a number of computational methods to assess the effects of variants on protein-protein binding affinity have been proposed, they are in general limited to the analysis of single point mutations and have been shown to perform poorly on independent test sets. Here, we present mmCSM-PPI, a scalable and effective machine learning model for accurately assessing changes in protein-protein binding affinity caused by single and multiple missense mutations. We expanded our well-established graph-based signatures in order to capture physicochemical and geometrical properties of multiple wild-type residue environments and integrated them with substitution scores and dynamics terms from normal mode analysis. mmCSM-PPI was able to achieve a Pearson's correlation of up to 0.75 (RMSE = 1.64 kcal/mol) under 10-fold cross-validation and 0.70 (RMSE = 2.06 kcal/mol) on a non-redundant blind test, outperforming existing methods. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/mmcsm_ppi.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
16
|
Al-Jarf R, de Sá AGC, Pires DEV, Ascher DB. pdCSM-cancer: Using Graph-Based Signatures to Identify Small Molecules with Anticancer Properties. J Chem Inf Model 2021; 61:3314-3322. [PMID: 34213323 PMCID: PMC8317153 DOI: 10.1021/acs.jcim.1c00168] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
![]()
The development of
new, effective, and safe drugs to treat cancer
remains a challenging and time-consuming task due to limited hit rates,
restraining subsequent development efforts. Despite the impressive
progress of quantitative structure–activity relationship and
machine learning-based models that have been developed to predict
molecule pharmacodynamics and bioactivity, they have had mixed success
at identifying compounds with anticancer properties against multiple
cell lines. Here, we have developed a novel predictive tool, pdCSM-cancer,
which uses a graph-based signature representation of the chemical
structure of a small molecule in order to accurately predict molecules
likely to be active against one or multiple cancer cell lines. pdCSM-cancer
represents the most comprehensive anticancer bioactivity prediction
platform developed till date, comprising trained and validated models
on experimental data of the growth inhibition concentration (GI50%)
effects, including over 18,000 compounds, on 9 tumor types and 74
distinct cancer cell lines. Across 10-fold cross-validation, it achieved
Pearson’s correlation coefficients of up to 0.74 and comparable
performance of up to 0.67 across independent, non-redundant blind
tests. Leveraging the insights from these cell line-specific models,
we developed a generic predictive model to identify molecules active
in at least 60 cell lines. Our final model achieved an area under
the receiver operating characteristic curve (AUC) of up to 0.94 on
10-fold cross-validation and up to 0.94 on independent non-redundant
blind tests, outperforming alternative approaches. We believe that
our predictive tool will provide a valuable resource to optimizing
and enriching screening libraries for the identification of effective
and safe anticancer molecules. To provide a simple and integrated
platform to rapidly screen for potential biologically active molecules
with favorable anticancer properties, we made pdCSM-cancer freely
available online at http://biosig.unimelb.edu.au/pdcsm_cancer.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United Kingdom
| |
Collapse
|
17
|
Qin T, Zhu Z, Wang XS, Xia J, Wu S. Computational representations of protein-ligand interfaces for structure-based virtual screening. Expert Opin Drug Discov 2021; 16:1175-1192. [PMID: 34011222 DOI: 10.1080/17460441.2021.1929921] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Introduction: Structure-based virtual screening (SBVS) is an essential strategy for hit identification. SBVS primarily uses molecular docking, which exploits the protein-ligand binding mode and associated affinity score for compound ranking. Previous studies have shown that computational representation of protein-ligand interfaces and the later establishment of machine learning models are efficacious in improving the accuracy of SBVS.Areas covered: The authors review the computational methods for representing protein-ligand interfaces, which include the traditional ones that use deliberately designed fingerprints and descriptors and the more recent methods that automatically extract features with deep learning. The effects of these methods on the performance of machine learning models are briefly discussed. Additionally, case studies that applied various computational representations to machine learning are cited with remarks.Expert opinion: It has become a trend to extract binding features automatically by deep learning, which uses a completely end-to-end representation. However, there is still plenty of scope for improvement . The interpretability of deep-learning models, the organization of data management, the quantity and quality of available data, and the optimization of hyperparameters could impact the accuracy of feature extraction. In addition, other important structural factors such as water molecules and protein flexibility should be considered.
Collapse
Affiliation(s)
- Tong Qin
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zihao Zhu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiang Simon Wang
- Artificial Intelligence and Drug Discovery Core Laboratory for District of Columbia Center for AIDS Research (DC CFAR), Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, U.S.A
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of New Drug Research and Development, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
18
|
Structural basis of the human Scribble-Vangl2 association in health and disease. Biochem J 2021; 478:1321-1332. [PMID: 33684218 PMCID: PMC8038854 DOI: 10.1042/bcj20200816] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 02/24/2021] [Accepted: 03/08/2021] [Indexed: 01/01/2023]
Abstract
Scribble is a critical cell polarity regulator that has been shown to work as either an oncogene or tumor suppressor in a context dependent manner, and also impacts cell migration, tissue architecture and immunity. Mutations in Scribble lead to neural tube defects in mice and humans, which has been attributed to a loss of interaction with the planar cell polarity regulator Vangl2. We show that the Scribble PDZ domains 1, 2 and 3 are able to interact with the C-terminal PDZ binding motif of Vangl2 and have now determined crystal structures of these Scribble PDZ domains bound to the Vangl2 peptide. Mapping of mammalian neural tube defect mutations reveal that mutations located distal to the canonical PDZ domain ligand binding groove can not only ablate binding to Vangl2 but also disrupt binding to multiple other signaling regulators. Our findings suggest that PDZ-associated neural tube defect mutations in Scribble may not simply act in a Vangl2 dependent manner but as broad-spectrum loss of function mutants by disrupting the global Scribble-mediated interaction network.
Collapse
|
19
|
Tunstall T, Portelli S, Phelan J, Clark TG, Ascher DB, Furnham N. Combining structure and genomics to understand antimicrobial resistance. Comput Struct Biotechnol J 2020; 18:3377-3394. [PMID: 33294134 PMCID: PMC7683289 DOI: 10.1016/j.csbj.2020.10.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/15/2020] [Accepted: 10/17/2020] [Indexed: 02/07/2023] Open
Abstract
Antimicrobials against bacterial, viral and parasitic pathogens have transformed human and animal health. Nevertheless, their widespread use (and misuse) has led to the emergence of antimicrobial resistance (AMR) which poses a potentially catastrophic threat to public health and animal husbandry. There are several routes, both intrinsic and acquired, by which AMR can develop. One major route is through non-synonymous single nucleotide polymorphisms (nsSNPs) in coding regions. Large scale genomic studies using high-throughput sequencing data have provided powerful new ways to rapidly detect and respond to such genetic mutations linked to AMR. However, these studies are limited in their mechanistic insight. Computational tools can rapidly and inexpensively evaluate the effect of mutations on protein function and evolution. Subsequent insights can then inform experimental studies, and direct existing or new computational methods. Here we review a range of sequence and structure-based computational tools, focussing on tools successfully used to investigate mutational effect on drug targets in clinically important pathogens, particularly Mycobacterium tuberculosis. Combining genomic results with the biophysical effects of mutations can help reveal the molecular basis and consequences of resistance development. Furthermore, we summarise how the application of such a mechanistic understanding of drug resistance can be applied to limit the impact of AMR.
Collapse
Affiliation(s)
- Tanushree Tunstall
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| | - Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Australia
| | - Jody Phelan
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| | - Taane G. Clark
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| | - David B. Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Australia
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| |
Collapse
|
20
|
Portelli S, Myung Y, Furnham N, Vedithi SC, Pires DEV, Ascher DB. Prediction of rifampicin resistance beyond the RRDR using structure-based machine learning approaches. Sci Rep 2020; 10:18120. [PMID: 33093532 PMCID: PMC7581776 DOI: 10.1038/s41598-020-74648-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 09/21/2020] [Indexed: 01/23/2023] Open
Abstract
Rifampicin resistance is a major therapeutic challenge, particularly in tuberculosis, leprosy, P. aeruginosa and S. aureus infections, where it develops via missense mutations in gene rpoB. Previously we have highlighted that these mutations reduce protein affinities within the RNA polymerase complex, subsequently reducing nucleic acid affinity. Here, we have used these insights to develop a computational rifampicin resistance predictor capable of identifying resistant mutations even outside the well-defined rifampicin resistance determining region (RRDR), using clinical M. tuberculosis sequencing information. Our tool successfully identified up to 90.9% of M. tuberculosis rpoB variants correctly, with sensitivity of 92.2%, specificity of 83.6% and MCC of 0.69, outperforming the current gold-standard GeneXpert-MTB/RIF. We show our model can be translated to other clinically relevant organisms: M. leprae, P. aeruginosa and S. aureus, despite weak sequence identity. Our method was implemented as an interactive tool, SUSPECT-RIF (StrUctural Susceptibility PrEdiCTion for RIFampicin), freely available at https://biosig.unimelb.edu.au/suspect_rif/ .
Collapse
Affiliation(s)
- Stephanie Portelli
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Victoria, 3010, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia
| | - Yoochan Myung
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Victoria, 3010, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | | | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia
- School of Computing and Information Systems, University of Melbourne, Victoria, 3010, Australia
| | - David B Ascher
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Victoria, 3010, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
21
|
Abstract
Mutations in protein-coding regions can lead to large biological changes and are associated with genetic conditions, including cancers and Mendelian diseases, as well as drug resistance. Although whole genome and exome sequencing help to elucidate potential genotype-phenotype correlations, there is a large gap between the identification of new variants and deciphering their molecular consequences. A comprehensive understanding of these mechanistic consequences is crucial to better understand and treat diseases in a more personalized and effective way. This is particularly relevant considering estimates that over 80% of mutations associated with a disease are incorrectly assumed to be causative. A thorough analysis of potential effects of mutations is required to correctly identify the molecular mechanisms of disease and enable the distinction between disease-causing and non-disease-causing variation within a gene. Here we present an overview of our integrative mutation analysis platform, which focuses on refining the current genotype-phenotype correlation methods by using the wealth of protein structural information.
Collapse
|
22
|
Newaz K, Ghalehnovi M, Rahnama A, Antsaklis PJ, Milenković T. Network-based protein structural classification. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191461. [PMID: 32742675 PMCID: PMC7353965 DOI: 10.1098/rsos.191461] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 05/05/2020] [Indexed: 06/11/2023]
Abstract
Experimental determination of protein function is resource-consuming. As an alternative, computational prediction of protein function has received attention. In this context, protein structural classification (PSC) can help, by allowing for determining structural classes of currently unclassified proteins based on their features, and then relying on the fact that proteins with similar structures have similar functions. Existing PSC approaches rely on sequence-based or direct three-dimensional (3D) structure-based protein features. By contrast, we first model 3D structures of proteins as protein structure networks (PSNs). Then, we use network-based features for PSC. We propose the use of graphlets, state-of-the-art features in many research areas of network science, in the task of PSC. Moreover, because graphlets can deal only with unweighted PSNs, and because accounting for edge weights when constructing PSNs could improve PSC accuracy, we also propose a deep learning framework that automatically learns network features from weighted PSNs. When evaluated on a large set of approximately 9400 CATH and approximately 12 800 SCOP protein domains (spanning 36 PSN sets), the best of our proposed approaches are superior to existing PSC approaches in terms of accuracy, with comparable running times. Our data and code are available at https://doi.org/10.5281/zenodo.3787922.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Mahboobeh Ghalehnovi
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Arash Rahnama
- Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Panos J. Antsaklis
- Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
- Center for Network and Data Science, University of Notre Dame, Notre Dame, IN 46556, USA
- Eck institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
23
|
Ribeiro VS, Santana CA, Fassio AV, Cerqueira FR, da Silveira CH, Romanelli JPR, Patarroyo-Vargas A, Oliveira MGA, Gonçalves-Almeida V, Izidoro SC, de Melo-Minardi RC, Silveira SDA. visGReMLIN: graph mining-based detection and visualization of conserved motifs at 3D protein-ligand interface at the atomic level. BMC Bioinformatics 2020; 21:80. [PMID: 32164574 PMCID: PMC7068867 DOI: 10.1186/s12859-020-3347-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. Results To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we believe are relevant to protein-ligand interactions in the analyzed datasets. Conclusions We aimed to build a visual analytics-oriented web server to detect and visualize common motifs at the protein-ligand interface. visGReMLIN motifs can support users in gaining insights on the key atoms/residues responsible for protein-ligand interactions in a dataset of complexes.
Collapse
Affiliation(s)
- Vagner S Ribeiro
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Charles A Santana
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Alexandre V Fassio
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Petrópolis, 25650-050, Brazil
| | - Carlos H da Silveira
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - João P R Romanelli
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - Adriana Patarroyo-Vargas
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Maria G A Oliveira
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Instituto de Biotecnologia aplicada à Agropecuária (BIOAGRO), Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Valdete Gonçalves-Almeida
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sandro C Izidoro
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.
| |
Collapse
|
24
|
Pandurangan AP, Blundell TL. Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning. Protein Sci 2020; 29:247-257. [PMID: 31693276 PMCID: PMC6933854 DOI: 10.1002/pro.3774] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/31/2019] [Accepted: 10/31/2019] [Indexed: 02/02/2023]
Abstract
Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE: Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.
Collapse
Affiliation(s)
- Arun Prasad Pandurangan
- Department of BiochemistryUniversity of CambridgeCambridgeUK
- MRC Laboratory of Molecular BiologyCambridgeUK
| | - Tom L. Blundell
- Department of BiochemistryUniversity of CambridgeCambridgeUK
| |
Collapse
|
25
|
dendPoint: a web resource for dendrimer pharmacokinetics investigation and prediction. Sci Rep 2019; 9:15465. [PMID: 31664080 PMCID: PMC6820739 DOI: 10.1038/s41598-019-51789-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 09/24/2019] [Indexed: 01/01/2023] Open
Abstract
Nanomedicine development currently suffers from a lack of efficient tools to predict pharmacokinetic behavior without relying upon testing in large numbers of animals, impacting success rates and development costs. This work presents dendPoint, the first in silico model to predict the intravenous pharmacokinetics of dendrimers, a commonly explored drug vector, based on physicochemical properties. We have manually curated the largest relational database of dendrimer pharmacokinetic parameters and their structural/physicochemical properties. This was used to develop a machine learning-based model capable of accurately predicting pharmacokinetic parameters, including half-life, clearance, volume of distribution and dose recovered in the liver and urine. dendPoint successfully predicts dendrimer pharmacokinetic properties, achieving correlations of up to r = 0.83 and Q2 up to 0.68. dendPoint is freely available as a user-friendly web-service and database at http://biosig.unimelb.edu.au/dendpoint. This platform is ultimately expected to be used to guide dendrimer construct design and refinement prior to embarking on more time consuming and expensive in vivo testing.
Collapse
|
26
|
A Computational Method to Propose Mutations in Enzymes Based on Structural Signature Variation (SSV). Int J Mol Sci 2019; 20:ijms20020333. [PMID: 30650542 PMCID: PMC6359350 DOI: 10.3390/ijms20020333] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 12/29/2018] [Accepted: 01/06/2019] [Indexed: 11/26/2022] Open
Abstract
With the use of genetic engineering, modified and sometimes more efficient enzymes can be created for different purposes, including industrial applications. However, building modified enzymes depends on several in vitro experiments, which may result in the process being expensive and time-consuming. Therefore, computational approaches could reduce costs and accelerate the discovery of new technological products. In this study, we present a method, called structural signature variation (SSV), to propose mutations for improving enzymes’ activity. SSV uses the structural signature variation between target enzymes and template enzymes (obtained from the literature) to determine if randomly suggested mutations may provide some benefit for an enzyme, such as improvement of catalytic activity, half-life, and thermostability, or resistance to inhibition. To evaluate SSV, we carried out a case study that suggested mutations in β-glucosidases: Essential enzymes used in biofuel production that suffer inhibition by their product. We collected 27 mutations described in the literature, and manually classified them as beneficial or not. SSV was able to classify the mutations with values of 0.89 and 0.92 for precision and specificity, respectively. Then, we used SSV to propose mutations for Bgl1B, a low-performance β-glucosidase. We detected 15 mutations that could be beneficial. Three of these mutations (H228C, H228T, and H228V) have been related in the literature to the mechanism of glucose tolerance and stimulation in GH1 β-glucosidase. Hence, SSV was capable of detecting promising mutations, already validated by in vitro experiments, that improved the inhibition resistance of a β-glucosidase and, consequently, its catalytic activity. SSV might be useful for the engineering of enzymes used in biofuel production or other industrial applications.
Collapse
|
27
|
Albanaz ATS, Rodrigues CHM, Pires DEV, Ascher DB. Combating mutations in genetic disease and drug resistance: understanding molecular mechanisms to guide drug design. Expert Opin Drug Discov 2017; 12:553-563. [PMID: 28490289 DOI: 10.1080/17460441.2017.1322579] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Mutations introduce diversity into genomes, leading to selective changes and driving evolution. These changes have contributed to the emergence of many of the current major health concerns of the 21st century, from the development of genetic diseases and cancers to the rise and spread of drug resistance. The experimental systematic testing of all mutations in a system of interest is impractical and not cost-effective, which has created interest in the development of computational tools to understand the molecular consequences of mutations to aid and guide rational experimentation. Areas covered: Here, the authors discuss the recent development of computational methods to understand the effects of coding mutations to protein function and interactions, particularly in the context of the 3D structure of the protein. Expert opinion: While significant progress has been made in terms of innovative tools to understand and quantify the different range of effects in which a mutation or a set of mutations can give rise to a phenotype, a great gap still exists when integrating these predictions and drawing causality conclusions linking variants. This often requires a detailed understanding of the system being perturbed. However, as part of the drug development process it can be used preemptively in a similar fashion to pharmacokinetics predictions, to guide development of therapeutics to help guide the design and analysis of clinical trials, patient treatment and public health policy strategies.
Collapse
Affiliation(s)
- Amanda T S Albanaz
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil.,b Department of Biochemistry and Immunology , Universidade Federal de Minas Gerais , Belo Horizonte , Minas Gerais , Brazil
| | - Carlos H M Rodrigues
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil.,b Department of Biochemistry and Immunology , Universidade Federal de Minas Gerais , Belo Horizonte , Minas Gerais , Brazil
| | - Douglas E V Pires
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil
| | - David B Ascher
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil.,c Department of Biochemistry , University of Cambridge , Cambridge , Cambridgeshire , UK.,d Department of Biochemistry and Molecular Biology , University of Melbourne , Melbourne , Victoria , Australia
| |
Collapse
|
28
|
McSkimming DI, Rasheed K, Kannan N. Classifying kinase conformations using a machine learning approach. BMC Bioinformatics 2017; 18:86. [PMID: 28152981 PMCID: PMC5290640 DOI: 10.1186/s12859-017-1506-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/28/2017] [Indexed: 02/07/2023] Open
Abstract
Background Signaling proteins such as protein kinases adopt a diverse array of conformations to respond to regulatory signals in signaling pathways. Perhaps the most fundamental conformational change of a kinase is the transition between active and inactive states, and defining the conformational features associated with kinase activation is critical for selectively targeting abnormally regulated kinases in diseases. While manual examination of crystal structures have led to the identification of key structural features associated with kinase activation, the large number of kinase crystal structures (~3,500) and extensive conformational diversity displayed by the protein kinase superfamily poses unique challenges in fully defining the conformational features associated with kinase activation. Although some computational approaches have been proposed, they are typically based on a small subset of crystal structures using measurements biased towards the active site geometry. Results We utilize an unbiased informatics based machine learning approach to classify all eukaryotic protein kinase conformations deposited in the PDB. We show that the orientation of the activation segment, measured by φ, ψ, χ1, and pseudo-dihedral angles more accurately classify kinase crystal conformations than existing methods. We show that the formation of the K-E salt bridge is statistically dependent upon the activation segment orientation and identify evolutionary differences between the activation segment conformation of tyrosine and serine/threonine kinases. We provide evidence that our method can identify conformational changes associated with the binding of allosteric regulatory proteins, and show that the greatest variation in inactive structures comes from kinase group and family specific side chain orientations. Conclusion We have provided the first comprehensive machine learning based classification of protein kinase active/inactive conformations, taking into account more structures and measurements than any previous classification effort. Further, our unbiased classification of inactive structures reveals residues associated with kinase functional specificity. To enable classification of new crystal structures, we have made our classifier publicly accessible through a stand-alone program housed at https://github.com/esbg/kinconform [DOI:10.5281/zenodo.249090]. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1506-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Khaled Rasheed
- Department of Computer Science, University of Georgia, Athens, GA, 30602, USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA. .,Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA, 30602, USA.
| |
Collapse
|
29
|
Boari de Lima E, Meira W, de Melo-Minardi RC. Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering. PLoS Comput Biol 2016; 12:e1005001. [PMID: 27348631 PMCID: PMC4922564 DOI: 10.1371/journal.pcbi.1005001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/22/2016] [Indexed: 01/14/2023] Open
Abstract
As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem's complexity. Hence, this work's purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.
Collapse
Affiliation(s)
- Elisa Boari de Lima
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | |
Collapse
|
30
|
Pires DEV, Ascher DB. CSM-lig: a web server for assessing and comparing protein-small molecule affinities. Nucleic Acids Res 2016; 44:W557-61. [PMID: 27151202 PMCID: PMC4987933 DOI: 10.1093/nar/gkw390] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 04/28/2016] [Indexed: 12/21/2022] Open
Abstract
Determining the affinity of a ligand for a given protein is a crucial component of drug development and understanding their biological effects. Predicting binding affinities is a challenging and difficult task, and despite being regarded as poorly predictive, scoring functions play an important role in the analysis of molecular docking results. Here, we present CSM-Lig (http://structure.bioc.cam.ac.uk/csm_lig), a web server tailored to predict the binding affinity of a protein-small molecule complex, encompassing both protein and small-molecule complementarity in terms of shape and chemistry via graph-based structural signatures. CSM-Lig was trained and evaluated on different releases of the PDBbind databases, achieving a correlation of up to 0.86 on 10-fold cross validation and 0.80 in blind tests, performing as well as or better than other widely used methods. The web server allows users to rapidly and automatically predict binding affinities of collections of structures and assess the interactions made. We believe CSM-lig would be an invaluable tool for helping assess docking poses, the effects of multiple mutations, including insertions, deletions and alternative splicing events, in protein-small molecule affinity, unraveling important aspects that drive protein–compound recognition.
Collapse
Affiliation(s)
- Douglas E V Pires
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, 30190-002, Brazil
| | - David B Ascher
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, 30190-002, Brazil Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK Department of Biochemistry, University of Melbourne, Victoria 3010, Australia
| |
Collapse
|
31
|
Computational approaches to study the effects of small genomic variations. J Mol Model 2015; 21:251. [PMID: 26350246 DOI: 10.1007/s00894-015-2794-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 08/23/2015] [Indexed: 10/23/2022]
Abstract
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.
Collapse
|
32
|
Maghawry HA, Mostafa MGM, Gharib TF. A new protein structure representation for efficient protein function prediction. J Comput Biol 2015; 21:936-46. [PMID: 25343279 DOI: 10.1089/cmb.2014.0137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.
Collapse
Affiliation(s)
- Huda A Maghawry
- 1 Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University , Cairo, Egypt
| | | | | |
Collapse
|
33
|
Pires DEV, Blundell TL, Ascher DB. pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures. J Med Chem 2015; 58:4066-72. [PMID: 25860834 PMCID: PMC4434528 DOI: 10.1021/acs.jmedchem.5b00104] [Citation(s) in RCA: 1873] [Impact Index Per Article: 208.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
![]()
Drug development has a high attrition
rate, with poor pharmacokinetic
and safety properties a significant hurdle. Computational approaches
may help minimize these risks. We have developed a novel approach
(pkCSM) which uses graph-based signatures to develop predictive models
of central ADMET properties for drug development. pkCSM performs as
well or better than current methods. A freely accessible web server
(http://structure.bioc.cam.ac.uk/pkcsm), which retains
no information submitted to it, provides an integrated platform to
rapidly evaluate pharmacokinetic and toxicity properties.
Collapse
Affiliation(s)
- Douglas E V Pires
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K.,‡Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Tom L Blundell
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K
| | - David B Ascher
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K
| |
Collapse
|
34
|
From local to global changes in proteins: a network view. Curr Opin Struct Biol 2015; 31:1-8. [DOI: 10.1016/j.sbi.2015.02.015] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 02/15/2015] [Accepted: 02/26/2015] [Indexed: 02/01/2023]
|
35
|
Ascher DB, Jubb HC, Pires DEV, Ochi T, Higueruelo A, Blundell TL. Protein-Protein Interactions: Structures and Druggability. MULTIFACETED ROLES OF CRYSTALLOGRAPHY IN MODERN DRUG DISCOVERY 2015. [DOI: 10.1007/978-94-017-9719-1_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
36
|
Gossage L, Pires DEV, Olivera-Nappa Á, Asenjo J, Bycroft M, Blundell TL, Eisen T. An integrated computational approach can classify VHL missense mutations according to risk of clear cell renal carcinoma. Hum Mol Genet 2014; 23:5976-88. [PMID: 24969085 PMCID: PMC4204774 DOI: 10.1093/hmg/ddu321] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Revised: 05/25/2014] [Accepted: 06/17/2014] [Indexed: 12/26/2022] Open
Abstract
Mutations in the von Hippel-Lindau (VHL) gene are pathogenic in VHL disease, congenital polycythaemia and clear cell renal carcinoma (ccRCC). pVHL forms a ternary complex with elongin C and elongin B, critical for pVHL stability and function, which interacts with Cullin-2 and RING-box protein 1 to target hypoxia-inducible factor for polyubiquitination and proteasomal degradation. We describe a comprehensive database of missense VHL mutations linked to experimental and clinical data. We use predictions from in silico tools to link the functional effects of missense VHL mutations to phenotype. The risk of ccRCC in VHL disease is linked to the degree of destabilization resulting from missense mutations. An optimized binary classification system (symphony), which integrates predictions from five in silico methods, can predict the risk of ccRCC associated with VHL missense mutations with high sensitivity and specificity. We use symphony to generate predictions for risk of ccRCC for all possible VHL missense mutations and present these predictions, in association with clinical and experimental data, in a publically available, searchable web server.
Collapse
Affiliation(s)
- Lucy Gossage
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Álvaro Olivera-Nappa
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK, Centre for Biochemical Engineering and Biotechnology, University of Chile, Beauchef 850, Santiago, Chile
| | - Juan Asenjo
- Centre for Biochemical Engineering and Biotechnology, University of Chile, Beauchef 850, Santiago, Chile
| | - Mark Bycroft
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Research Centre, Cambridge CB2 0QH, UK and
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Tim Eisen
- Department of Oncology, Cambridge University Hospitals NHS Foundation Trust, Box 193 (R4) Addenbrooke's Hospital, Cambridge Biomedical Campus, Hill's Road, Cambridge CB2 0QQ, UK
| |
Collapse
|
37
|
Silveira SA, Fassio AV, Gonçalves-Almeida VM, de Lima EB, Barcelos YT, Aburjaile FF, Rodrigues LM, Meira W, de Melo-Minardi RC. VERMONT: Visualizing mutations and their effects on protein physicochemical and topological property conservation. BMC Proc 2014; 8:S4. [PMID: 25237391 PMCID: PMC4155615 DOI: 10.1186/1753-6561-8-s2-s4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In this paper, we propose an interactive visualization called VERMONT which tackles the problem of visualizing mutations and infers their possible effects on the conservation of physicochemical and topological properties in protein families. More specifically, we visualize a set of structure-based sequence alignments and integrate several structural parameters that should aid biologists in gaining insight into possible consequences of mutations. VERMONT allowed us to identify patterns of position-specific properties as well as exceptions that may help predict whether specific mutations could damage protein function.
Collapse
Affiliation(s)
- Sabrina A Silveira
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Alexandre V Fassio
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Valdete M Gonçalves-Almeida
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Elisa B de Lima
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Yussif T Barcelos
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Flávia F Aburjaile
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Laerte M Rodrigues
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| |
Collapse
|
38
|
Pires DEV, Ascher DB, Blundell TL. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res 2014; 42:W314-9. [PMID: 24829462 PMCID: PMC4086143 DOI: 10.1093/nar/gku411] [Citation(s) in RCA: 560] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Cancer genome and other sequencing initiatives are generating extensive data on non-synonymous single nucleotide polymorphisms (nsSNPs) in human and other genomes. In order to understand the impacts of nsSNPs on the structure and function of the proteome, as well as to guide protein engineering, accurate in silicomethodologies are required to study and predict their effects on protein stability. Despite the diversity of available computational methods in the literature, none has proven accurate and dependable on its own under all scenarios where mutation analysis is required. Here we present DUET, a web server for an integrated computational approach to study missense mutations in proteins. DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimized predictor using Support Vector Machines (SVM). We demonstrate that the proposed method improves overall accuracy of the predictions in comparison with either method individually and performs as well as or better than similar methods. The DUET web server is freely and openly available at http://structure.bioc.cam.ac.uk/duet.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - David B Ascher
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK ACRF Rational Drug Discovery Centre and Biota Structural Biology Laboratory, St Vincents Institute of Medical Research, Fitzroy, VIC 3065, Australia
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| |
Collapse
|
39
|
Silveira SDA, de Melo-Minardi RC, da Silveira CH, Santoro MM, Meira Jr W. ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot. PLoS One 2014; 9:e89162. [PMID: 24586563 PMCID: PMC3929618 DOI: 10.1371/journal.pone.0089162] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 01/19/2014] [Indexed: 11/18/2022] Open
Abstract
The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.
Collapse
Affiliation(s)
- Sabrina de Azevedo Silveira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (SAS); (WM)
| | | | | | - Marcelo Matos Santoro
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira Jr
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (SAS); (WM)
| |
Collapse
|
40
|
Pires DEV, Ascher DB, Blundell TL. mCSM: predicting the effects of mutations in proteins using graph-based signatures. ACTA ACUST UNITED AC 2013; 30:335-42. [PMID: 24281696 PMCID: PMC3904523 DOI: 10.1093/bioinformatics/btt691] [Citation(s) in RCA: 630] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: Mutations play fundamental roles in evolution by introducing diversity into genomes. Missense mutations in structural genes may become either selectively advantageous or disadvantageous to the organism by affecting protein stability and/or interfering with interactions between partners. Thus, the ability to predict the impact of mutations on protein stability and interactions is of significant value, particularly in understanding the effects of Mendelian and somatic mutations on the progression of disease. Here, we propose a novel approach to the study of missense mutations, called mCSM, which relies on graph-based signatures. These encode distance patterns between atoms and are used to represent the protein residue environment and to train predictive models. To understand the roles of mutations in disease, we have evaluated their impacts not only on protein stability but also on protein–protein and protein–nucleic acid interactions. Results: We show that mCSM performs as well as or better than other methods that are used widely. The mCSM signatures were successfully used in different tasks demonstrating that the impact of a mutation can be correlated with the atomic-distance patterns surrounding an amino acid residue. We showed that mCSM can predict stability changes of a wide range of mutations occurring in the tumour suppressor protein p53, demonstrating the applicability of the proposed method in a challenging disease scenario. Availability and implementation: A web server is available at http://structure.bioc.cam.ac.uk/mcsm. Contact:dpires@dcc.ufmg.br; tom@cryst.bioc.cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK and ACRF Rational Drug Discovery Centre and Biota Structural Biology Laboratory, St Vincents Institute of Medical Research, Fitzroy, VIC, 3065, Australia
| | | | | |
Collapse
|
41
|
Pires DEV, de Melo-Minardi RC, da Silveira CH, Campos FF, Meira W. aCSM: noise-free graph-based signatures to large-scale receptor-based ligand prediction. ACTA ACUST UNITED AC 2013; 29:855-61. [PMID: 23396119 DOI: 10.1093/bioinformatics/btt058] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Receptor-ligand interactions are a central phenomenon in most biological systems. They are characterized by molecular recognition, a complex process mainly driven by physicochemical and structural properties of both receptor and ligand. Understanding and predicting these interactions are major steps towards protein ligand prediction, target identification, lead discovery and drug design. RESULTS We propose a novel graph-based-binding pocket signature called aCSM, which proved to be efficient and effective in handling large-scale protein ligand prediction tasks. We compare our results with those described in the literature and demonstrate that our algorithm overcomes the competitor's techniques. Finally, we predict novel ligands for proteins from Trypanosoma cruzi, the parasite responsible for Chagas disease, and validate them in silico via a docking protocol, showing the applicability of the method in suggesting ligands for pockets in a real-world scenario. AVAILABILITY AND IMPLEMENTATION Datasets and the source code are available at http://www.dcc.ufmg.br/∼dpires/acsm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha Belo Horizonte - MG, 31270-901, Brazil.
| | | | | | | | | |
Collapse
|
42
|
Volkamer A, Kuhn D, Rippmann F, Rarey M. Predicting enzymatic function from global binding site descriptors. Proteins 2012; 81:479-89. [DOI: 10.1002/prot.24205] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 09/21/2012] [Accepted: 10/11/2012] [Indexed: 11/09/2022]
|