1
|
Zheng Y, Li Q, Freiberger MI, Song H, Hu G, Zhang M, Gu R, Li J. Predicting the Dynamic Interaction of Intrinsically Disordered Proteins. J Chem Inf Model 2024; 64:6768-6777. [PMID: 39163306 DOI: 10.1021/acs.jcim.4c00930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
Intrinsically disordered proteins (IDPs) participate in various biological processes. Interactions involving IDPs are usually dynamic and are affected by their inherent conformation fluctuations. Comprehensive characterization of these interactions based on current techniques is challenging. Here, we present GSALIDP, a GraphSAGE-embedded LSTM network, to capture the dynamic nature of IDP-involved interactions and predict their behaviors. This framework models multiple conformations of IDP as a dynamic graph, which can effectively describe the fluctuation of its flexible conformation. The dynamic interaction between IDPs is studied, and the data sets of IDP conformations and their interactions are obtained through atomistic molecular dynamic (MD) simulations. Residues of IDP are encoded through a series of features including their frustration. GSALIDP can effectively predict the interaction sites of IDP and the contact residue pairs between IDPs. Its performance in predicting IDP interactions is on par with or even better than the conventional models in predicting the interaction of structural proteins. To the best of our knowledge, this is the first model to extend the protein interaction prediction to IDP-involved interactions.
Collapse
Affiliation(s)
- Yuchuan Zheng
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Qixiu Li
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Maria I Freiberger
- Protein Physiology Lab, Departamento de Quimica Biologica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires C1428EGA, Argentina
| | - Haoyu Song
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Guorong Hu
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Moxin Zhang
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| | - Ruoxu Gu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Jingyuan Li
- School of Physics, Zhejiang University, Hangzhou 310058, PR China
| |
Collapse
|
2
|
AlJarf R, Rodrigues CHM, Myung Y, Pires DEV, Ascher DB. piscesCSM: prediction of anticancer synergistic drug combinations. J Cheminform 2024; 16:81. [PMID: 39030592 PMCID: PMC11264925 DOI: 10.1186/s13321-024-00859-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 05/12/2024] [Indexed: 07/21/2024] Open
Abstract
While drug combination therapies are of great importance, particularly in cancer treatment, identifying novel synergistic drug combinations has been a challenging venture. Computational methods have emerged in this context as a promising tool for prioritizing drug combinations for further evaluation, though they have presented limited performance, utility, and interpretability. Here, we propose a novel predictive tool, piscesCSM, that leverages graph-based representations to model small molecule chemical structures to accurately predict drug combinations with favourable anticancer synergistic effects against one or multiple cancer cell lines. Leveraging these insights, we developed a general supervised machine learning model to guide the prediction of anticancer synergistic drug combinations in over 30 cell lines. It achieved an area under the receiver operating characteristic curve (AUROC) of up to 0.89 on independent non-redundant blind tests, outperforming state-of-the-art approaches on both large-scale oncology screening data and an independent test set generated by AstraZeneca (with more than a 16% improvement in predictive accuracy). Moreover, by exploring the interpretability of our approach, we found that simple physicochemical properties and graph-based signatures are predictive of chemotherapy synergism. To provide a simple and integrated platform to rapidly screen potential candidate pairs with favourable synergistic anticancer effects, we made piscesCSM freely available online at https://biosig.lab.uq.edu.au/piscescsm/ as a web server and API. We believe that our predictive tool will provide a valuable resource for optimizing and augmenting combinatorial screening libraries to identify effective and safe synergistic anticancer drug combinations. SCIENTIFIC CONTRIBUTION: This work proposes piscesCSM, a machine-learning-based framework that relies on well-established graph-based representations of small molecules to identify and provide better predictive accuracy of syngenetic drug combinations. Our model, piscesCSM, shows that combining physiochemical properties with graph-based signatures can outperform current architectures on classification prediction tasks. Furthermore, implementing our tool as a web server offers a user-friendly platform for researchers to screen for potential synergistic drug combinations with favorable anticancer effects against one or multiple cancer cell lines.
Collapse
Affiliation(s)
- Raghad AlJarf
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Carlos H M Rodrigues
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia.
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
3
|
Zhang Z, Zhao L, Wang J, Wang C. A Hierarchical Graph Neural Network Framework for Predicting Protein-Protein Interaction Modulators With Functional Group Information and Hypergraph Structure. IEEE J Biomed Health Inform 2024; 28:4295-4305. [PMID: 38564358 DOI: 10.1109/jbhi.2024.3384238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Accurate prediction of small molecule modulators targeting protein-protein interactions (PPIMs) remains a significant challenge in drug discovery. Existing machine learning-based models rely on manual feature engineering, which is tedious and task-specific. Recently, deep learning models based on graph neural networks have made remarkable progress in molecular representation learning. However, many graph-based approaches ignore molecular hierarchical structure modeling guided by domain knowledge. In chemistry, the functional groups of a molecule determine its interaction with specific targets. Therefore, we propose a hierarchical graph neural network framework (called HiGPPIM) for predicting PPIMs by integrating atom-level and functional group-level features of molecules. HiGPPIM constructs atom-level and functional group-level graphs based on chemical knowledge and learns graph representations using graph attention networks. Furthermore, a hypergraph attention network is designed in HiGPPIM to aggregate and transform two-level graph information. We evaluate the performance of HiGPPIM on eight PPI families and two prediction tasks, namely PPIM identification and potency prediction. Experimental results demonstrate that HiGPPIM achieves state-of-the-art performance on both tasks and that using functional group information to guide PPIM prediction is effective.
Collapse
|
4
|
Velloso JPL, de Sá AGC, Pires DEV, Ascher DB. Engineering G protein-coupled receptors for stabilization. Protein Sci 2024; 33:e5000. [PMID: 38747401 PMCID: PMC11094779 DOI: 10.1002/pro.5000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/21/2024] [Accepted: 04/10/2024] [Indexed: 05/19/2024]
Abstract
G protein-coupled receptors (GPCRs) are one of the most important families of targets for drug discovery. One of the limiting steps in the study of GPCRs has been their stability, with significant and time-consuming protein engineering often used to stabilize GPCRs for structural characterization and drug screening. Unfortunately, computational methods developed using globular soluble proteins have translated poorly to the rational engineering of GPCRs. To fill this gap, we propose GPCR-tm, a novel and personalized structurally driven web-based machine learning tool to study the impacts of mutations on GPCR stability. We show that GPCR-tm performs as well as or better than alternative methods, and that it can accurately rank the stability changes of a wide range of mutations occurring in various types of class A GPCRs. GPCR-tm achieved Pearson's correlation coefficients of 0.74 and 0.46 on 10-fold cross-validation and blind test sets, respectively. We observed that the (structural) graph-based signatures were the most important set of features for predicting destabilizing mutations, which points out that these signatures properly describe the changes in the environment where the mutations occur. More specifically, GPCR-tm was able to accurately rank mutations based on their effect on protein stability, guiding their rational stabilization. GPCR-tm is available through a user-friendly web server at https://biosig.lab.uq.edu.au/gpcr_tm/.
Collapse
Affiliation(s)
- João Paulo L. Velloso
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Alex G. C. de Sá
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Douglas E. V. Pires
- School of Computing and Information SystemsThe University of MelbourneParkvilleVictoriaAustralia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| |
Collapse
|
5
|
Lin TE, Yen D, HuangFu W, Wu Y, Hsu J, Yen S, Sung T, Hsieh J, Pan S, Yang C, Huang W, Hsu K. An ensemble machine learning model generates a focused screening library for the identification of CDK8 inhibitors. Protein Sci 2024; 33:e5007. [PMID: 38723187 PMCID: PMC11081523 DOI: 10.1002/pro.5007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/26/2024] [Accepted: 04/13/2024] [Indexed: 05/13/2024]
Abstract
The identification of an effective inhibitor is an important starting step in drug development. Unfortunately, many issues such as the characterization of protein binding sites, the screening library, materials for assays, etc., make drug screening a difficult proposition. As the size of screening libraries increases, more resources will be inefficiently consumed. Thus, new strategies are needed to preprocess and focus a screening library towards a targeted protein. Herein, we report an ensemble machine learning (ML) model to generate a CDK8-focused screening library. The ensemble model consists of six different algorithms optimized for CDK8 inhibitor classification. The models were trained using a CDK8-specific fragment library along with molecules containing CDK8 activity. The optimized ensemble model processed a commercial library containing 1.6 million molecules. This resulted in a CDK8-focused screening library containing 1,672 molecules, a reduction of more than 99.90%. The CDK8-focused library was then subjected to molecular docking, and 25 candidate compounds were selected. Enzymatic assays confirmed six CDK8 inhibitors, with one compound producing an IC50 value of ≤100 nM. Analysis of the ensemble ML model reveals the role of the CDK8 fragment library during training. Structural analysis of molecules reveals the hit compounds to be structurally novel CDK8 inhibitors. Together, the results highlight a pipeline for curating a focused library for a specific protein target, such as CDK8.
Collapse
Affiliation(s)
- Tony Eight Lin
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
- Ph.D. Program for Cancer Molecular Biology and Drug DiscoveryCollege of Medical Science and Technology, Taipei Medical UniversityTaipeiTaiwan
| | - Dyan Yen
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
| | - Wei‐Chun HuangFu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
- Ph.D. Program for Cancer Molecular Biology and Drug DiscoveryCollege of Medical Science and Technology, Taipei Medical UniversityTaipeiTaiwan
- TMU Research Center of Cancer Translational MedicineTaipei Medical UniversityTaipeiTaiwan
| | - Yi‐Wen Wu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
| | - Jui‐Yi Hsu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
- Ph.D. Program for Cancer Molecular Biology and Drug DiscoveryCollege of Medical Science and Technology, Taipei Medical UniversityTaipeiTaiwan
| | - Shih‐Chung Yen
- Warshel Institute for Computational BiologyThe Chinese University of Hong Kong (Shenzhen)ShenzhenGuangdongPeople's Republic of China
| | - Tzu‐Ying Sung
- Biomedical Translation Research Center, Academia SinicaTaipeiTaiwan
| | - Jui‐Hua Hsieh
- Division of Translational ToxicologyNational Institute of Environmental Health Sciences, National Institutes of HealthDurhamNorth CarolinaUSA
| | - Shiow‐Lin Pan
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
- Ph.D. Program for Cancer Molecular Biology and Drug DiscoveryCollege of Medical Science and Technology, Taipei Medical UniversityTaipeiTaiwan
- TMU Research Center of Cancer Translational MedicineTaipei Medical UniversityTaipeiTaiwan
| | - Chia‐Ron Yang
- School of Pharmacy, College of MedicineNational Taiwan UniversityTaipeiTaiwan
| | - Wei‐Jan Huang
- Graduate Institute of Pharmacognosy, College of PharmacyTaipei Medical UniversityTaipeiTaiwan
| | - Kai‐Cheng Hsu
- Graduate Institute of Cancer Biology and Drug Discovery, College of Medical Science and TechnologyTaipei Medical UniversityTaipeiTaiwan
- Ph.D. Program for Cancer Molecular Biology and Drug DiscoveryCollege of Medical Science and Technology, Taipei Medical UniversityTaipeiTaiwan
- TMU Research Center of Cancer Translational MedicineTaipei Medical UniversityTaipeiTaiwan
- Cancer Center, Wan Fang HospitalTaipei Medical UniversityTaipeiTaiwan
| |
Collapse
|
6
|
Zhang Z, Zhao L, Gao M, Chen Y, Wang J, Wang C. PPII-AEAT: Prediction of protein-protein interaction inhibitors based on autoencoders with adversarial training. Comput Biol Med 2024; 172:108287. [PMID: 38503089 DOI: 10.1016/j.compbiomed.2024.108287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/21/2024] [Accepted: 03/12/2024] [Indexed: 03/21/2024]
Abstract
Protein-protein interactions (PPIs) have shown increasing potential as novel drug targets. The design and development of small molecule inhibitors targeting specific PPIs are crucial for the prevention and treatment of related diseases. Accordingly, effective computational methods are highly desired to meet the emerging need for the large-scale accurate prediction of PPI inhibitors. However, existing machine learning models rely heavily on the manual screening of features and lack generalizability. Here, we propose a new PPI inhibitor prediction method based on autoencoders with adversarial training (named PPII-AEAT) that can adaptively learn molecule representation to cope with different PPI targets. First, Extended-connectivity fingerprints and Mordred descriptors are employed to extract the primary features of small molecular compounds. Then, an autoencoder architecture is trained in three phases to learn high-level representations and predict inhibitory scores. We evaluate PPII-AEAT on nine PPI targets and two different tasks, including the PPI inhibitor identification task and inhibitory potency prediction task. The experimental results show that our proposed PPII-AEAT outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Zitong Zhang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Mengyao Gao
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Yuanlong Chen
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
7
|
Velloso JPL, Kovacs AS, Pires DEV, Ascher DB. AI-driven GPCR analysis, engineering, and targeting. Curr Opin Pharmacol 2024; 74:102427. [PMID: 38219398 DOI: 10.1016/j.coph.2023.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024]
Abstract
This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Collapse
Affiliation(s)
- João P L Velloso
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Aaron S Kovacs
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
8
|
Abstract
The greatest challenge in drug discovery remains the high rate of attrition across the different phases of the process, which cost the industry billions of dollars every year. While all phases remain crucial to ensure pharmaceutical-level safety, quality, and efficacy of the end product, streamlining these efforts toward compounds with success potential is pivotal for a more efficient and cost-effective process. The use of artificial intelligence (AI) within the pharmaceutical industry aims at just this, and has applications in preclinical screening for biological activity, optimization of pharmacokinetic properties for improved drug formulation, early toxicity prediction which reduces attrition, and pre-emptively screening for genetic changes in the biological target to improve therapeutic longevity. Here, we present a series of in silico tools that address these applications in small molecule development and describe how they can be embedded within the current pharmaceutical development pipeline.
Collapse
Affiliation(s)
- Adam Serghini
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
| |
Collapse
|
9
|
Sun H, Wang J, Wu H, Lin S, Chen J, Wei J, Lv S, Xiong Y, Wei DQ. A Multimodal Deep Learning Framework for Predicting PPI-Modulator Interactions. J Chem Inf Model 2023; 63:7363-7372. [PMID: 38037990 DOI: 10.1021/acs.jcim.3c01527] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Protein-protein interactions (PPIs) are essential for various biological processes and diseases. However, most existing computational methods for identifying PPI modulators require either target structure or reference modulators, which restricts their applicability to novel PPI targets. To address this challenge, we propose MultiPPIMI, a sequence-based deep learning framework that predicts the interaction between any given PPI target and modulator. MultiPPIMI integrates multimodal representations of PPI targets and modulators and uses a bilinear attention network to capture intermolecular interactions. Experimental results on our curated benchmark data set show that MultiPPIMI achieves an average AUROC of 0.837 in three cold-start scenarios and an AUROC of 0.994 in the random-split scenario. Furthermore, the case study shows that MultiPPIMI can assist molecular docking simulations in screening inhibitors of Keap1/Nrf2 PPI interactions. We believe that the proposed method provides a promising way to screen PPI-targeted modulators.
Collapse
Affiliation(s)
- Heqi Sun
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea
| | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Junwei Chen
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jinghua Wei
- Department of Chemistry, University of Toronto, Toronto M5R 0A3, Canada
| | - Shuai Lv
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng National Laboratory, Shenzhen 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Nanyang 473006, China
| |
Collapse
|
10
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
11
|
Liang J, Wu Y, Lan K, Dong C, Wu S, Li S, Zhou HB. Antiviral PROTACs: Opportunity borne with challenge. CELL INSIGHT 2023; 2:100092. [PMID: 37398636 PMCID: PMC10308200 DOI: 10.1016/j.cellin.2023.100092] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 07/04/2023]
Abstract
Proteolysis targeting chimera (PROTAC) degradation of pathogenic proteins by hijacking of the ubiquitin-proteasome-system has become a promising strategy in drug design. The overwhelming advantages of PROTAC technology have ensured a rapid and wide usage, and multiple PROTACs have entered clinical trials. Several antiviral PROTACs have been developed with promising bioactivities against various pathogenic viruses. However, the number of reported antiviral PROTACs is far less than that of other diseases, e.g., cancers, immune disorders, and neurodegenerative diseases, possibly because of the common deficiencies of PROTAC technology (e.g., limited available ligands and poor membrane permeability) plus the complex mechanism involved and the high tendency of viral mutation during transmission and replication, which may challenge the successful development of effective antiviral PROTACs. This review highlights the important advances in this rapidly growing field and critical limitations encountered in developing antiviral PROTACs by analyzing the current status and representative examples of antiviral PROTACs and other PROTAC-like antiviral agents. We also summarize and analyze the general principles and strategies for antiviral PROTAC design and optimization with the intent of indicating the potential strategic directions for future progress.
Collapse
Affiliation(s)
- Jinsen Liang
- Medical Research Institute, Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, 430071, China
| | - Yihe Wu
- Provincial Key Laboratory of Developmentally Originated Disease, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (MOE) and Hubei Province Engineering and Technology Research Center for Fluorinated Pharmaceuticals, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China
| | - Ke Lan
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Chune Dong
- Medical Research Institute, Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, 430071, China
| | - Shuwen Wu
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Shu Li
- Medical Research Institute, Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, 430071, China
| | - Hai-Bing Zhou
- Medical Research Institute, Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, 430071, China
- Provincial Key Laboratory of Developmentally Originated Disease, Key Laboratory of Combinatorial Biosynthesis and Drug Discovery (MOE) and Hubei Province Engineering and Technology Research Center for Fluorinated Pharmaceuticals, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China
| |
Collapse
|
12
|
Shen L, Feng H, Qiu Y, Wei GW. SVSBI: sequence-based virtual screening of biomolecular interactions. Commun Biol 2023; 6:536. [PMID: 37202415 PMCID: PMC10195826 DOI: 10.1038/s42003-023-04866-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/24/2023] [Indexed: 05/20/2023] Open
Abstract
Virtual screening (VS) is a critical technique in understanding biomolecular interactions, particularly in drug design and discovery. However, the accuracy of current VS models heavily relies on three-dimensional (3D) structures obtained through molecular docking, which is often unreliable due to the low accuracy. To address this issue, we introduce a sequence-based virtual screening (SVS) as another generation of VS models that utilize advanced natural language processing (NLP) algorithms and optimized deep K-embedding strategies to encode biomolecular interactions without relying on 3D structure-based docking. We demonstrate that SVS outperforms state-of-the-art performance for four regression datasets involving protein-ligand binding, protein-protein, protein-nucleic acid binding, and ligand inhibition of protein-protein interactions and five classification datasets for protein-protein interactions in five biological species. SVS has the potential to transform current practices in drug discovery and protein engineering.
Collapse
Affiliation(s)
- Li Shen
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
13
|
Gao M, Zhao L, Zhang Z, Wang J, Wang C. Using a stacked ensemble learning framework to predict modulators of protein-protein interactions. Comput Biol Med 2023; 161:107032. [PMID: 37230018 DOI: 10.1016/j.compbiomed.2023.107032] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/13/2023] [Accepted: 05/10/2023] [Indexed: 05/27/2023]
Abstract
Identifying small molecule protein-protein interaction modulators (PPIMs) is a highly promising and meaningful research direction for drug discovery, cancer treatment, and other fields. In this study, we developed a stacking ensemble computational framework, SELPPI, based on a genetic algorithm and tree-based machine learning method for effectively predicting new modulators targeting protein-protein interactions. More specifically, extremely randomized trees (ExtraTrees), adaptive boosting (AdaBoost), random forest (RF), cascade forest, light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost) were used as basic learners. Seven types of chemical descriptors were taken as the input characteristic parameters. Primary predictions were obtained with each basic learner-descriptor pair. Then, the 6 methods mentioned above were used as meta learners and trained on the primary prediction in turn. The most efficient method was utilized as the meta learner. Finally, the genetic algorithm was used to select the optimal primary prediction output as the input of the meta learner for secondary prediction to obtain the final result. We systematically evaluated our model on the pdCSM-PPI datasets. To our knowledge, our model outperformed all existing models, which demonstrates its great power.
Collapse
Affiliation(s)
- Mengyao Gao
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| | - Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| | - Zitong Zhang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| | - Junjie Wang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China.
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
14
|
Zhang M, Xue B, Li Q, Shi R, Cao Y, Wang W, Li J. Sequence Tendency for the Interaction between Low-Complexity Intrinsically Disordered Proteins. JACS AU 2023; 3:93-104. [PMID: 36711093 PMCID: PMC9875249 DOI: 10.1021/jacsau.2c00414] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 12/16/2022] [Accepted: 12/16/2022] [Indexed: 06/18/2023]
Abstract
Reversible interaction between intrinsically disordered proteins (IDPs) is considered as the driving force for liquid-liquid phase separation (LLPS), while the detailed description of such a transient interaction process still remains a challenge. And the mechanisms underlying the behavior of IDP interaction, for example, the possible relationship with its inherent conformational fluctuations and sequence features, remain elusive. Here, we use atomistic molecular dynamics (MD) simulation to investigate the reversible association of the LAF-1 RGG domain, the IDP with ultra-low LLPS concentration (0.06 mM). We find that the duration of the association between two RGG domains is highly heterogeneous, and the sustained associations essentially dominate the IDP interaction. More interestingly, such sustained associations are mediated by a finite region, that is, the C-terminal region 138-168 (denoted as a contact-prone region). We noticed that such sequence tendency is attributed to the extended conformation of the RGG domain during its inherent conformational fluctuations. Hence, our results suggest that there is a certain region in this low-complexity IDP which can essentially dominate their interaction and should be also important to the LLPS. And the inherent conformational fluctuations are actually essential for the emergence of such a hot region of IDP interaction. The importance of this hot region to LLPS is verified by experiment.
Collapse
Affiliation(s)
- Moxin Zhang
- Zhejiang
Province Key Laboratory of Quantum Technology and Device, School of
Physics, Zhejiang University, Hangzhou310058, China
| | - Bin Xue
- Collaborative
Innovation Center of Advanced Microstructures, National Laboratory
of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing210093, China
| | - Qingtai Li
- Collaborative
Innovation Center of Advanced Microstructures, National Laboratory
of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing210093, China
| | - Rui Shi
- Zhejiang
Province Key Laboratory of Quantum Technology and Device, School of
Physics, Zhejiang University, Hangzhou310058, China
| | - Yi Cao
- Collaborative
Innovation Center of Advanced Microstructures, National Laboratory
of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing210093, China
| | - Wei Wang
- Collaborative
Innovation Center of Advanced Microstructures, National Laboratory
of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing210093, China
| | - Jingyuan Li
- Zhejiang
Province Key Laboratory of Quantum Technology and Device, School of
Physics, Zhejiang University, Hangzhou310058, China
| |
Collapse
|
15
|
Iftkhar S, de Sá AGC, Velloso JPL, Aljarf R, Pires DEV, Ascher DB. cardioToxCSM: A Web Server for Predicting Cardiotoxicity of Small Molecules. J Chem Inf Model 2022; 62:4827-4836. [PMID: 36219164 DOI: 10.1021/acs.jcim.2c00822] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The design of novel, safe, and effective drugs to treat human diseases is a challenging venture, with toxicity being one of the main sources of attrition at later stages of development. Failure due to toxicity incurs a significant increase in costs and time to market, with multiple drugs being withdrawn from the market due to their adverse effects. Cardiotoxicity, for instance, was responsible for the failure of drugs such as fenspiride, propoxyphene, and valdecoxib. While significant effort has been dedicated to mitigate this issue by developing computational approaches that aim to identify molecules likely to be toxic, including quantitative structure-activity relationship models and machine learning methods, current approaches present limited performance and interpretability. To overcome these, we propose a new web-based computational method, cardioToxCSM, which can predict six types of cardiac toxicity outcomes, including arrhythmia, cardiac failure, heart block, hERG toxicity, hypertension, and myocardial infarction, efficiently and accurately. cardioToxCSM was developed using the concept of graph-based signatures, molecular descriptors, toxicophore matchings, and molecular fingerprints, leveraging explainable machine learning, and was validated internally via different cross validation schemes and externally via low-redundancy blind sets. The models presented robust performances with areas under ROC curves of up to 0.898 on 5-fold cross-validation, consistent with metrics on blind tests. Additionally, our models provide interpretation of the predictions by identifying whether substructures that are commonly enriched in toxic compounds were present. We believe cardioToxCSM will provide valuable insight into the potential cardiotoxicity of small molecules early on drug screening efforts. The method is made freely available as a web server at https://biosig.lab.uq.edu.au/cardiotoxcsm.
Collapse
Affiliation(s)
- Saba Iftkhar
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - João P L Velloso
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Raghad Aljarf
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| |
Collapse
|
16
|
de Sá AGC, Long Y, Portelli S, Pires DEV, Ascher DB. toxCSM: comprehensive prediction of small molecule toxicity profiles. Brief Bioinform 2022; 23:6673851. [PMID: 35998885 DOI: 10.1093/bib/bbac337] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/17/2022] [Accepted: 07/23/2022] [Indexed: 01/29/2023] Open
Abstract
Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson's correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at http://biosig.lab.uq.edu.au/toxcsm.
Collapse
Affiliation(s)
- Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Yangyang Long
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, 3052, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland, 4072, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, Victoria, 3052, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, 3004, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, Victoria, 3010, Australia
| |
Collapse
|
17
|
Pires DEV, Stubbs KA, Mylne JS, Ascher DB. cropCSM: designing safe and potent herbicides with graph-based signatures. Brief Bioinform 2022; 23:bbac042. [PMID: 35211724 PMCID: PMC9155605 DOI: 10.1093/bib/bbac042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 12/11/2022] Open
Abstract
Herbicides have revolutionised weed management, increased crop yields and improved profitability allowing for an increase in worldwide food security. Their widespread use, however, has also led to a rise in resistance and concerns about their environmental impact. Despite the need for potent and safe herbicidal molecules, no herbicide with a new mode of action has reached the market in 30 years. Although development of computational approaches has proven invaluable to guide rational drug discovery pipelines, leading to higher hit rates and lower attrition due to poor toxicity, little has been done in contrast for herbicide design. To fill this gap, we have developed cropCSM, a computational platform to help identify new, potent, nontoxic and environmentally safe herbicides. By using a knowledge-based approach, we identified physicochemical properties and substructures enriched in safe herbicides. By representing the small molecules as a graph, we leveraged these insights to guide the development of predictive models trained and tested on the largest collected data set of molecules with experimentally characterised herbicidal profiles to date (over 4500 compounds). In addition, we developed six new environmental and human toxicity predictors, spanning five different species to assist in molecule prioritisation. cropCSM was able to correctly identify 97% of herbicides currently available commercially, while predicting toxicity profiles with accuracies of up to 92%. We believe cropCSM will be an essential tool for the enrichment of screening libraries and to guide the development of potent and safe herbicides. We have made the method freely available through a user-friendly webserver at http://biosig.unimelb.edu.au/crop_csm.
Collapse
Affiliation(s)
- Douglas E V Pires
- School of Computing and Information Systems at the University of Melbourne
| | - Keith A Stubbs
- School of Molecular Sciences at the University of Western Australia
| | - Joshua S Mylne
- Curtin University and Deputy Director of the Centre for Crop and Disease Management
| | - David B Ascher
- University of Queensland, and head of Computational Biology and Clinical Informatics at the Baker Institute and Systems
| |
Collapse
|