1
|
MMSyn: A New Multimodal Deep Learning Framework for Enhanced Prediction of Synergistic Drug Combinations. J Chem Inf Model 2024; 64:3689-3705. [PMID: 38676916 DOI: 10.1021/acs.jcim.4c00165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2024]
Abstract
Combination therapy is a promising strategy for the successful treatment of cancer. The large number of possible combinations, however, mean that it is laborious and expensive to screen for synergistic drug combinations in vitro. Nevertheless, because of the availability of high-throughput screening data and advances in computational techniques, deep learning (DL) can be a useful tool for the prediction of synergistic drug combinations. In this study, we proposed a multimodal DL framework, MMSyn, for the prediction of synergistic drug combinations. First, features embedded in the drug molecules were extracted: structure, fingerprint, and string encoding. Then, gene expression data, DNA copy number, and pathway activity were used to describe cancer cell lines. Finally, these processed features were integrated using an attention mechanism and an interaction module and then input into a multilayer perceptron to predict drug synergy. Experimental results showed that our method outperformed five state-of-the-art DL methods and three traditional machine learning models for drug combination prediction. We verified that MMSyn achieved superior performance in stratified cross-validation settings using both the drug combination and cell line data. Moreover, we performed a set of ablation experiments to illustrate the effectiveness of each component and the efficacy of our model. In addition, our visual representation and case studies further confirmed the effectiveness of our model. All results showed that MMSyn can be used as a powerful tool for the prediction of synergistic drug combinations.
Collapse
|
2
|
Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
|
3
|
Predicting drug-induced liver injury using graph attention mechanism and molecular fingerprints. Methods 2024; 221:18-26. [PMID: 38040204 DOI: 10.1016/j.ymeth.2023.11.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is a significant issue in drug development and clinical treatment due to its potential to cause liver dysfunction or damage, which, in severe cases, can lead to liver failure or even fatality. DILI has numerous pathogenic factors, many of which remain incompletely understood. Consequently, it is imperative to devise methodologies and tools for anticipatory assessment of DILI risk in the initial phases of drug development. In this study, we present DMFPGA, a novel deep learning predictive model designed to predict DILI. To provide a comprehensive description of molecular properties, we employ a multi-head graph attention mechanism to extract features from the molecular graphs, representing characteristics at the level of compound nodes. Additionally, we combine multiple fingerprints of molecules to capture features at the molecular level of compounds. The fusion of molecular fingerprints and graph features can more fully express the properties of compounds. Subsequently, we employ a fully connected neural network to classify compounds as either DILI-positive or DILI-negative. To rigorously evaluate DMFPGA's performance, we conduct a 5-fold cross-validation experiment. The obtained results demonstrate the superiority of our method over four existing state-of-the-art computational approaches, exhibiting an average AUC of 0.935 and an average ACC of 0.934. We believe that DMFPGA is helpful for early-stage DILI prediction and assessment in drug development.
Collapse
|
4
|
MVML-MPI: Multi-View Multi-Label Learning for Metabolic Pathway Inference. Brief Bioinform 2023; 24:bbad393. [PMID: 37930024 DOI: 10.1093/bib/bbad393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/20/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023] Open
Abstract
Development of robust and effective strategies for synthesizing new compounds, drug targeting and constructing GEnome-scale Metabolic models (GEMs) requires a deep understanding of the underlying biological processes. A critical step in achieving this goal is accurately identifying the categories of pathways in which a compound participated. However, current machine learning-based methods often overlook the multifaceted nature of compounds, resulting in inaccurate pathway predictions. Therefore, we present a novel framework on Multi-View Multi-Label Learning for Metabolic Pathway Inference, hereby named MVML-MPI. First, MVML-MPI learns the distinct compound representations in parallel with corresponding compound encoders to fully extract features. Subsequently, we propose an attention-based mechanism that offers a fusion module to complement these multi-view representations. As a result, MVML-MPI accurately represents and effectively captures the complex relationship between compounds and metabolic pathways and distinguishes itself from current machine learning-based methods. In experiments conducted on the Kyoto Encyclopedia of Genes and Genomes pathways dataset, MVML-MPI outperformed state-of-the-art methods, demonstrating the superiority of MVML-MPI and its potential to utilize the field of metabolic pathway design, which can aid in optimizing drug-like compounds and facilitating the development of GEMs. The code and data underlying this article are freely available at https://github.com/guofei-tju/MVML-MPI. Contact: jtang@cse.sc.edu, guofei@csu.edu.com or wuxi_dyj@csj.uestc.edu.cn.
Collapse
|
5
|
A Comprehensive Comparative Analysis of Deep Learning Based Feature Representations for Molecular Taste Prediction. Foods 2023; 12:3386. [PMID: 37761095 PMCID: PMC10529232 DOI: 10.3390/foods12183386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Taste determination in small molecules is critical in food chemistry but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as valuable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2601 molecules. The results reveal that GNN-based models outperform other approaches in taste prediction. Moreover, consensus models that combine diverse molecular representations demonstrate improved performance. Among these, the molecular fingerprints + GNN consensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
Collapse
|
6
|
Pharmacophore-Based Machine Learning Model To Predict Ligand Selectivity for E3 Ligase Binders. ACS OMEGA 2023; 8:30177-30185. [PMID: 37636935 PMCID: PMC10448689 DOI: 10.1021/acsomega.3c02803] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/06/2023] [Indexed: 08/29/2023]
Abstract
E3 ligases are enzymes that play a critical role in ubiquitin-mediated protein degradation and are involved in various cellular processes. Pharmacophore analysis is a useful approach for predicting E3 ligase binding selectivity, which involves identifying key chemical features necessary for a ligand to interact with a specific protein target cavity. While pharmacophore analysis is not always sufficient to accurately predict ligand binding affinity, it can be a valuable tool for filtering and/or designing focused libraries for screening campaigns. In this study, we present a fast and an inexpensive approach using a pharmacophore fingerprinting scheme known as ErG, which is used in a multi-class machine learning classification model. This model can assign the correct E3 ligase binder to its known E3 ligase and predict the probability of each molecule to bind to different E3 ligases. Practical applications of this approach are demonstrated on commercial libraries such as Asinex for the rational design of E3 ligase binders. The scripts and data associated with this study can be found on GitHub at https://github.com/Fraunhofer-ITMP/E3_binder_Model.
Collapse
|
7
|
Hybrid neural network approaches to predict drug-target binding affinity for drug repurposing: screening for potential leads for Alzheimer's disease. Front Mol Biosci 2023; 10:1227371. [PMID: 37441162 PMCID: PMC10334190 DOI: 10.3389/fmolb.2023.1227371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 06/13/2023] [Indexed: 07/15/2023] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease that primarily affects elderly individuals. Recent studies have found that sigma-1 receptor (S1R) agonists can maintain endoplasmic reticulum stress homeostasis, reduce neuronal apoptosis, and enhance mitochondrial function and autophagy, making S1R a target for AD therapy. Traditional experimental methods are costly and inefficient, and rapid and accurate prediction methods need to be developed, while drug repurposing provides new ways and options for AD treatment. In this paper, we propose HNNDTA, a hybrid neural network for drug-target affinity (DTA) prediction, to facilitate drug repurposing for AD treatment. The study combines protein-protein interaction (PPI) network analysis, the HNNDTA model, and molecular docking to identify potential leads for AD. The HNNDTA model was constructed using 13 drug encoding networks and 9 target encoding networks with 2506 FDA-approved drugs as the candidate drug library for S1R and related proteins. Seven potential drugs were identified using network pharmacology and DTA prediction results of the HNNDTA model. Molecular docking simulations were further performed using the AutoDock Vina tool to screen haloperidol and bromperidol as lead compounds for AD treatment. Absorption, distribution, metabolism, excretion, and toxicity (ADMET) evaluation results indicated that both compounds had good pharmacokinetic properties and were virtually non-toxic. The study proposes a new approach to computer-aided drug design that is faster and more economical, and can improve hit rates for new drug compounds. The results of this study provide new lead compounds for AD treatment, which may be effective due to their multi-target action. HNNDTA is freely available at https://github.com/lizhj39/HNNDTA.
Collapse
|
8
|
Novel Molecular Representations Using Neumann-Cayley Orthogonal Gated Recurrent Unit. J Chem Inf Model 2023; 63:2656-2666. [PMID: 37075324 DOI: 10.1021/acs.jcim.2c01526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023]
Abstract
Advances in deep neural networks (DNNs) have made a very powerful machine learning method available to researchers across many fields of study, including the biomedical and cheminformatics communities, where DNNs help to improve tasks such as protein performance, molecular design, drug discovery, etc. Many of those tasks rely on molecular descriptors for representing molecular characteristics in cheminformatics. Despite significant efforts and the introduction of numerous methods that derive molecular descriptors, the quantitative prediction of molecular properties remains challenging. One widely used method of encoding molecule features into bit strings is the molecular fingerprint. In this work, we propose using new Neumann-Cayley Gated Recurrent Units (NC-GRU) inside the Neural Nets encoder (AutoEncoder) to create neural molecular fingerprints (NC-GRU fingerprints). The NC-GRU AutoEncoder introduces orthogonal weights into widely used GRU architecture, resulting in faster, more stable training, and more reliable molecular fingerprints. Integrating novel NC-GRU fingerprints and Multi-Task DNN schematics improves the performance of various molecular-related tasks such as toxicity, partition coefficient, lipophilicity, and solvation-free energy, producing state-of-the-art results on several benchmarks.
Collapse
|
9
|
Gex2SGen: Designing Drug-like Molecules from Desired Gene Expression Signatures. J Chem Inf Model 2023; 63:1882-1893. [PMID: 36971750 DOI: 10.1021/acs.jcim.2c01301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
Abstract
Drug-induced gene expression profiling provides a lot of useful information covering various aspects of drug discovery and development. Most importantly, this knowledge can be used to discover drugs' mechanisms of action. Recently, deep learning-based drug design methods are in the spotlight due to their ability to explore huge chemical space and design property-optimized target-specific drug molecules. Recent advances in accessibility of open-source drug-induced transcriptomic data along with the ability of deep learning algorithms to understand hidden patterns have opened opportunities for designing drug molecules based on desired gene expression signatures. In this study, we propose a deep learning model, Gex2SGen (Gene Expression 2 SMILES Generation), to generate novel drug-like molecules based on desired gene expression profiles. The model accepts desired gene expression profiles in a cell-specific manner as input and designs drug-like molecules which can elicit the required transcriptomic profile. The model was first tested against individual gene-knocked-out transcriptomic profiles, where the newly designed molecules showed high similarity with known inhibitors of the knocked-out target genes. The model was next applied on a triple negative breast cancer signature profile, where it could generate novel molecules, highly similar to known anti-breast cancer drugs. Overall, this work provides a generalized method, where the method first learned the molecular signature of a given cell due to a specific condition, and designs new small molecules with drug-like properties.
Collapse
|
10
|
Quantifying Functional-Group-like Structural Fragments in Molecules and Its Applications in Drug Design. J Chem Inf Model 2023; 63:2073-2083. [PMID: 36881497 DOI: 10.1021/acs.jcim.3c00050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
A functional group in a molecule is a structural fragment consisting of a few atoms or a single atom that imparts reactivity to a molecule. Hence, defining functional groups is crucial in chemistry to predict the properties and reactivities of molecules. However, there is no established method in the literature for defining functional groups based on reactivity parameters. In this work, we addressed this issue by designing a set of predefined structural fragments along with reactivity parameters like electron conjugation and ring strain. This approach uses bond orders and atom connectivities to quantify the presence of these fragments within an organic molecule based on a given input molecular coordinate. To assess the effectiveness of this approach, we performed a case study to show the benefits of using these newly designed structural fragments instead of traditional fingerprint-based methods for grouping potential COX1/COX2 inhibitors by screening an approved drug library against aspirin molecule. The structural fragment-based model for ternary classification of rat oral LD50 of chemicals showed performance similar to the fingerprint-based models. In evaluating the regression model performance for aqueous solubility, log(S), predictions, our approach outperformed the fingerprint-based model.
Collapse
|
11
|
De novo design of anti-tuberculosis agents using a structure-based deep learning method. J Mol Graph Model 2023; 118:108361. [PMID: 36257148 DOI: 10.1016/j.jmgm.2022.108361] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/10/2022] [Accepted: 10/07/2022] [Indexed: 11/28/2022]
Abstract
Mycobacterium tuberculosis (Mtb) is a pathogen of major concern due to its ability to withstand both first- and second-line antibiotics, leading to drug resistance. Thus, there is a critical need for identification of novel anti-tuberculosis agents targeting Mtb-specific proteins. The ceaseless search for novel antimicrobial agents to combat drug-resistant bacteria can be accelerated by the development of advanced deep learning methods, to explore both existing and uncharted regions of the chemical space. The adaptation of deep learning methods to under-explored pathogens such as Mtb is a challenging aspect, as most of the existing methods rely on the availability of sufficient target-specific ligand data to design novel small molecules with optimized bioactivity. In this work, we report the design of novel anti-tuberculosis agents targeting the Mtb chorismate mutase protein using a structure-based drug design algorithm. The structure-based deep learning method relies on the knowledge of the target protein's binding site structure alone for conditional generation of novel small molecules. The method eliminates the need for curation of a high-quality target-specific small molecule dataset, which remains a challenge even for many druggable targets, including Mtb chorismate mutase. Novel molecules are proposed, that show high complementarity to the target binding site. The graph attention model could identify the probable key binding site residues, which influenced the conditional molecule generator to design new molecules with pharmacophoric features similar to the known inhibitors.
Collapse
|
12
|
FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief Bioinform 2022; 23:6702671. [PMID: 36124766 DOI: 10.1093/bib/bbac408] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 07/28/2022] [Accepted: 08/22/2022] [Indexed: 12/14/2022] Open
Abstract
Accurate prediction of molecular properties, such as physicochemical and bioactive properties, as well as ADME/T (absorption, distribution, metabolism, excretion and toxicity) properties, remains a fundamental challenge for molecular design, especially for drug design and discovery. In this study, we advanced a novel deep learning architecture, termed FP-GNN (fingerprints and graph neural networks), which combined and simultaneously learned information from molecular graphs and fingerprints for molecular property prediction. To evaluate the FP-GNN model, we conducted experiments on 13 public datasets, an unbiased LIT-PCBA dataset and 14 phenotypic screening datasets for breast cell lines. Extensive evaluation results showed that compared to advanced deep learning and conventional machine learning algorithms, the FP-GNN algorithm achieved state-of-the-art performance on these datasets. In addition, we analyzed the influence of different molecular fingerprints, and the effects of molecular graphs and molecular fingerprints on the performance of the FP-GNN model. Analysis of the anti-noise ability and interpretation ability also indicated that FP-GNN was competitive in real-world situations. Collectively, FP-GNN algorithm can assist chemists, biologists and pharmacists in predicting and discovering better molecules with desired functions or properties.
Collapse
|
13
|
Biomolecular Topology: Modelling and Analysis. ACTA MATHEMATICA SINICA, ENGLISH SERIES 2022; 38:1901-1938. [PMID: 36407804 PMCID: PMC9640850 DOI: 10.1007/s10114-022-2326-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/12/2022] [Indexed: 05/25/2023]
Abstract
With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.
Collapse
|
14
|
A multi-task FP-GNN framework enables accurate prediction of selective PARP inhibitors. Front Pharmacol 2022; 13:971369. [DOI: 10.3389/fphar.2022.971369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
PARP (poly ADP-ribose polymerase) family is a crucial DNA repair enzyme that responds to DNA damage, regulates apoptosis, and maintains genome stability; therefore, PARP inhibitors represent a promising therapeutic strategy for the treatment of various human diseases including COVID-19. In this study, a multi-task FP-GNN (Fingerprint and Graph Neural Networks) deep learning framework was proposed to predict the inhibitory activity of molecules against four PARP isoforms (PARP-1, PARP-2, PARP-5A, and PARP-5B). Compared with baseline predictive models based on four conventional machine learning methods such as RF, SVM, XGBoost, and LR as well as six deep learning algorithms such as DNN, Attentive FP, MPNN, GAT, GCN, and D-MPNN, the evaluation results indicate that the multi-task FP-GNN method achieves the best performance with the highest average BA, F1, and AUC values of 0.753 ± 0.033, 0.910 ± 0.045, and 0.888 ± 0.016 for the test set. In addition, Y-scrambling testing successfully verified that the model was not results of chance correlation. More importantly, the interpretability of the multi-task FP-GNN model enabled the identification of key structural fragments associated with the inhibition of each PARP isoform. To facilitate the use of the multi-task FP-GNN model in the field, an online webserver called PARPi-Predict and its local version software were created to predict whether compounds bear potential inhibitory activity against PARPs, thereby contributing to design and discover better selective PARP inhibitors.
Collapse
|
15
|
In-Silico Drug Toxicity and Interaction Prediction for Plant Complexes Based on Virtual Screening and Text Mining. Int J Mol Sci 2022; 23:ijms231710056. [PMID: 36077464 PMCID: PMC9456415 DOI: 10.3390/ijms231710056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 08/30/2022] [Accepted: 09/01/2022] [Indexed: 11/16/2022] Open
Abstract
Potential drug toxicities and drug interactions of redundant compounds of plant complexes may cause unexpected clinical responses or even severe adverse events. On the other hand, super-additivity of drug interactions between natural products and synthetic drugs may be utilized to gain better performance in disease management. Although without enough datasets for prediction model training, based on the SwissSimilarity and PubChem platforms, for the first time, a feasible workflow of prediction of both toxicity and drug interaction of plant complexes was built in this study. The optimal similarity score threshold for toxicity prediction of this system is 0.6171, based on an analysis of 20 different herbal medicines. From the PubChem database, 31 different sections of toxicity information such as "Acute Effects", "NIOSH Toxicity Data", "Interactions", "Hepatotoxicity", "Carcinogenicity", "Symptoms", and "Human Toxicity Values" sections have been retrieved, with dozens of active compounds predicted to exert potential toxicities. In Spatholobus suberectus Dunn (SSD), there are 9 out of 24 active compounds predicted to play synergistic effects on cancer management with various drugs or factors. The synergism between SSD, luteolin and docetaxel in the management of triple-negative breast cancer was proved by the combination index assay, synergy score detection assay, and xenograft model.
Collapse
|
16
|
Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation. J Cheminform 2022; 14:52. [PMID: 35927691 PMCID: PMC9351086 DOI: 10.1186/s13321-022-00634-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 07/16/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.
Collapse
|
17
|
BoostSweet: Learning molecular perceptual representations of sweeteners. Food Chem 2022; 383:132435. [PMID: 35182866 DOI: 10.1016/j.foodchem.2022.132435] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 09/16/2021] [Accepted: 02/09/2022] [Indexed: 11/28/2022]
Abstract
The development of safe artificial sweeteners has attracted considerable interest in the food industry. Previous machine learning (ML) studies based on quantitative structure-activity relationships have provided some molecular principles for predicting sweetness, but these models can be improved via the chemical recognition of sweetness active factors. Our ML model, a soft-vote ensemble model that has a light gradient boosting machine and uses both layered fingerprints and alvaDesc molecular descriptor features, demonstrates state-of-the-art performance, with an AUROC score of 0.961. Based on an analysis of feature importance and dataset, we identified that the number of nitrogen atoms that serve as hydrogen bond donors in molecules can play an essential role in determining sweetness. These results potentially provide an advanced understanding of the relationship between molecular structure and sweetness, which can be used to design new sweeteners based on molecular structural dependence.
Collapse
|
18
|
UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning. Molecules 2022; 27:molecules27092980. [PMID: 35566330 PMCID: PMC9100109 DOI: 10.3390/molecules27092980] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 04/26/2022] [Accepted: 04/28/2022] [Indexed: 01/27/2023] Open
Abstract
Drug-target interaction (DTI) prediction through in vitro methods is expensive and time-consuming. On the other hand, computational methods can save time and money while enhancing drug discovery efficiency. Most of the computational methods frame DTI prediction as a binary classification task. One important challenge is that the number of negative interactions in all DTI-related datasets is far greater than the number of positive interactions, leading to the class imbalance problem. As a result, a classifier is trained biased towards the majority class (negative class), whereas the minority class (interacting pairs) is of interest. This class imbalance problem is not widely taken into account in DTI prediction studies, and the few previous studies considering balancing in DTI do not focus on the imbalance issue itself. Additionally, they do not benefit from deep learning models and experimental validation. In this study, we propose a computational framework along with experimental validations to predict drug-target interaction using an ensemble of deep learning models to address the class imbalance problem in the DTI domain. The objective of this paper is to mitigate the bias in the prediction of DTI by focusing on the impact of balancing and maintaining other involved parameters at a constant value. Our analysis shows that the proposed model outperforms unbalanced models with the same architecture trained on the BindingDB both computationally and experimentally. These findings demonstrate the significance of balancing, which reduces the bias towards the negative class and leads to better performance. It is important to note that leaning on computational results without experimentally validating them and by relying solely on AUROC and AUPRC metrics is not credible, particularly when the testing set remains unbalanced.
Collapse
|
19
|
Visualization of Topological Pharmacophore Space with Graph Edit Distance. ACS OMEGA 2022; 7:14057-14068. [PMID: 35559135 PMCID: PMC9088954 DOI: 10.1021/acsomega.2c00173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 03/25/2022] [Indexed: 06/15/2023]
Abstract
A topological pharmacophore (TP) is a chemical graph-based pharmacophore representation, where nodes are pharmacophoric features (PF) and edges are topological distances between PFs. Previously proposed sparse pharmacophore graphs (SPhGs) for TPs were shown to be effective in identifying structurally different active compounds while maintaining the interpretability of the graphs. However, one limitation of using SPhGs as queries is that many structurally similar SPhGs can be identified from a set of active compounds, requiring the classification and visualization of SPhGs, followed by an understanding of the pharmacophore hypotheses. In this study, we propose a scheme for SPhG analysis based on dimensionality reduction techniques with the graph edit distance (GED) metric. This metric enables measuring similarities among SPhGs in a quantitative manner. The visualization of SPhGs, which themselves are the graphs shared by active compounds, can help us understand the pharmacophore hypotheses as well as the data set. As a proof-of-concept study, we generated two-dimensional SPhG-maps using three dimensionality reduction techniques for six biological targets. A comparison with other pharmacophore representations was also conducted. We demonstrated knowledge extraction (interpretation of the data set) from the generated maps. Our findings include a suitable mapping algorithm as well as a pharmacophore hypothesis analysis procedure using an SPhG-map.
Collapse
|
20
|
Dowker complex based machine learning (DCML) models for protein-ligand binding affinity prediction. PLoS Comput Biol 2022; 18:e1009943. [PMID: 35385478 PMCID: PMC8985993 DOI: 10.1371/journal.pcbi.1009943] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/21/2022] [Indexed: 11/19/2022] Open
Abstract
With the great advancements in experimental data, computational power and learning algorithms, artificial intelligence (AI) based drug design has begun to gain momentum recently. AI-based drug design has great promise to revolutionize pharmaceutical industries by significantly reducing the time and cost in drug discovery processes. However, a major issue remains for all AI-based learning model that is efficient molecular representations. Here we propose Dowker complex (DC) based molecular interaction representations and Riemann Zeta function based molecular featurization, for the first time. Molecular interactions between proteins and ligands (or others) are modeled as Dowker complexes. A multiscale representation is generated by using a filtration process, during which a series of DCs are generated at different scales. Combinatorial (Hodge) Laplacian matrices are constructed from these DCs, and the Riemann zeta functions from their spectral information can be used as molecular descriptors. To validate our models, we consider protein-ligand binding affinity prediction. Our DC-based machine learning (DCML) models, in particular, DC-based gradient boosting tree (DC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-2007, PDBbind-2013 and PDBbind-2016, and extensively compared with other existing state-of-the-art models. It has been found that our DC-based descriptors can achieve the state-of-the-art results and have better performance than all machine learning models with traditional molecular descriptors. Our Dowker complex based machine learning models can be used in other tasks in AI-based drug design and molecular data analysis. With the ever-increasing accumulation of chemical and biomolecular data, data-driven artificial intelligence (AI) models will usher in an era of faster, cheaper and more-efficient drug design and drug discovery. However, unlike image, text, video, audio data, molecular data from chemistry and biology, have much complicated three-dimensional structures, as well as physical and chemical properties. Efficient molecular representations and descriptors are key to the success of machine learning models in drug design. Here, we propose Dowker complex based molecular representation and Riemann Zeta function based molecular featurization, for the first time. To characterize the complicated molecular structures and interactions at the atomic level, Dowker complexes are constructed. Based on them, intrinsic mathematical invariants are derived and used as molecular descriptors, which can be further combined with machine learning and deep learning models. Our model has achieved state-of-the-art results in protein-ligand binding affinity prediction, demonstrating its great potential for other drug design and discovery problems.
Collapse
|
21
|
Molecular Design Learned from the Natural Product Porphyra-334: Molecular Generation via Chemical Variational Autoencoder versus Database Mining via Similarity Search, A Comparative Study. ACS OMEGA 2022; 7:8581-8590. [PMID: 35309498 PMCID: PMC8928499 DOI: 10.1021/acsomega.1c06453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/18/2022] [Indexed: 06/14/2023]
Abstract
A comparative study is presented. The method via chemical variational autoencoder (VAE) and the method via similarity search are compared, focusing on their generation ability for new functional molecular design. Focusing on the natural porphyra-334 as a model molecule, we generated three groups: molecules of mycosporine-like amino acids (MAAs) as seeds (G SEEDS ), molecules generated via chemical VAE (G VAE ) and molecules gathered via similarity search (G SIM ). The number of molecules that satisfy the condition for the light absorption ability of porphyra-334 in G SEEDS , G VAE , and G SIM are 52, 138, and 6, respectively. The method via chemical VAE shows a promising potential for future molecular design. By using quantum chemistry wave function properties for chemical VAE, we find new molecules that are comparable to porphyra-334, including some with unexpected geometries. At the end, we show a group of molecules found with this method.
Collapse
|
22
|
In-silico screening of potential target transporters for glycyrrhetinic acid (GA) via deep learning prediction of drug-target interactions. Biochem Eng J 2022. [DOI: 10.1016/j.bej.2022.108375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
23
|
Molecular persistent spectral image (Mol-PSI) representation for machine learning models in drug design. Brief Bioinform 2022; 23:6485012. [PMID: 34958660 DOI: 10.1093/bib/bbab527] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/01/2021] [Accepted: 11/14/2021] [Indexed: 01/05/2023] Open
Abstract
Artificial intelligence (AI)-based drug design has great promise to fundamentally change the landscape of the pharmaceutical industry. Even though there are great progress from handcrafted feature-based machine learning models, 3D convolutional neural networks (CNNs) and graph neural networks, effective and efficient representations that characterize the structural, physical, chemical and biological properties of molecular structures and interactions remain to be a great challenge. Here, we propose an equal-sized molecular 2D image representation, known as the molecular persistent spectral image (Mol-PSI), and combine it with CNN model for AI-based drug design. Mol-PSI provides a unique one-to-one image representation for molecular structures and interactions. In general, deep models are empowered to achieve better performance with systematically organized representations in image format. A well-designed parallel CNN architecture for adapting Mol-PSIs is developed for protein-ligand binding affinity prediction. Our results, for the three most commonly used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, are better than all traditional machine learning models, as far as we know. Our Mol-PSI model provides a powerful molecular representation that can be widely used in AI-based drug design and molecular data analysis.
Collapse
|
24
|
The SwissSimilarity 2021 Web Tool: Novel Chemical Libraries and Additional Methods for an Enhanced Ligand-Based Virtual Screening Experience. Int J Mol Sci 2022; 23:ijms23020811. [PMID: 35054998 PMCID: PMC8776004 DOI: 10.3390/ijms23020811] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/06/2022] [Accepted: 01/07/2022] [Indexed: 01/27/2023] Open
Abstract
Hit finding, scaffold hopping, and structure–activity relationship studies are important tasks in rational drug discovery. Implementation of these tasks strongly depends on the availability of compounds similar to a known bioactive molecule. SwissSimilarity is a web tool for low-to-high-throughput virtual screening of multiple chemical libraries to find molecules similar to a compound of interest. According to the similarity principle, the output list of molecules generated by SwissSimilarity is expected to be enriched in compounds that are likely to share common protein targets with the query molecule and that can, therefore, be acquired and tested experimentally in priority. Compound libraries available for screening using SwissSimilarity include approved drugs, clinical candidates, known bioactive molecules, commercially available and synthetically accessible compounds. The first version of SwissSimilarity launched in 2015 made use of various 2D and 3D molecular descriptors, including path-based FP2 fingerprints and ElectroShape vectors. However, during the last few years, new fingerprinting methods for molecular description have been developed or have become popular. Here we would like to announce the launch of the new version of the SwissSimilarity web tool, which features additional 2D and 3D methods for estimation of molecular similarity: extended-connectivity, MinHash, 2D pharmacophore, extended reduced graph, and extended 3D fingerprints. Moreover, it is now possible to screen for molecular structures having the same scaffold as the query compound. Additionally, all compound libraries available for screening in SwissSimilarity have been updated, and several new ones have been added to the list. Finally, the interface of the website has been comprehensively rebuilt to provide a better user experience. The new version of SwissSimilarity is freely available starting from December 2021.
Collapse
|
25
|
Ligand-Based Virtual Screening Based on the Graph Edit Distance. Int J Mol Sci 2021; 22:12751. [PMID: 34884555 PMCID: PMC8658044 DOI: 10.3390/ijms222312751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 11/12/2021] [Accepted: 11/13/2021] [Indexed: 11/25/2022] Open
Abstract
Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets-CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS-have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.
Collapse
|
26
|
Abstract
In recent years, deep learning-based methods have emerged as promising tools for de novo drug design. Most of these methods are ligand-based, where an initial target-specific ligand data set is necessary to design potent molecules with optimized properties. Although there have been attempts to develop alternative ways to design target-specific ligand data sets, availability of such data sets remains a challenge while designing molecules against novel target proteins. In this work, we propose a deep learning-based method, where the knowledge of the active site structure of the target protein is sufficient to design new molecules. First, a graph attention model was used to learn the structure and features of the amino acids in the active site of proteins that are experimentally known to form protein-ligand complexes. Next, the learned active site features were used along with a pretrained generative model for conditional generation of new molecules. A bioactivity prediction model was then used in a reinforcement learning framework to optimize the conditional generative model. We validated our method against two well-studied proteins, Janus kinase 2 (JAK2) and dopamine receptor D2 (DRD2), where we produce molecules similar to the known inhibitors. The graph attention model could identify the probable key active site residues, which influenced the conditional molecule generator to design new molecules with pharmacophoric features similar to the known inhibitors.
Collapse
|
27
|
Abstract
The aim of scaffold hopping (SH) is to find compounds consisting of different scaffolds from those in already known active compounds, giving an opportunity for unexplored regions of chemical space. We previously demonstrated the usefulness of pharmacophore graphs (PhGs) for this purpose through proof-of-concept virtual screening experiments. PhGs consist of nodes and edges corresponding to pharmacophoric features (PFs) and their topological distances. Although PhGs were effective in SH, they are hard to interpret as they are complete graphs. Herein, we introduce an intuitive representation of a molecule, termed as sparse pharmacophore graphs (SPhG) by keeping the topological distances among PFs as much as possible while reducing the number of edges in the graphs. Several benchmark calculations quantitatively confirmed the sparseness of the graphs and the preservation of topological distances among pharmacophoric points. As proof-of-concept applications, virtual screening (VS) trials for SH were conducted using active and inactive compounds from ChEMBL and PubChem databases for three biological targets: thrombin, tyrosine kinase ABL1, and κ-opioid receptor. The performances of VS were comparable with using fully connected PhGs. Furthermore, highly ranked SPhGs were interpretable for the three biological targets, in particular for thrombin, for which selected SPhGs were in agreement with the structure-based interpretation.
Collapse
|
28
|
Abstract
Chemical engineering is being rapidly transformed by the tools of data science. On the horizon, artificial intelligence (AI) applications will impact a huge swath of our work, ranging from the discovery and design of new molecules to operations and manufacturing and many areas in between. Early adoption of data science, machine learning, and early examples of AI in chemical engineering has been rich with examples of molecular data science-the application tools for molecular discovery and property optimization at the atomic scale. We summarize key advances in this nascent subfield while introducing molecular data science for a broad chemical engineering readership. We introduce the field through the concept of a molecular data science life cycle and discuss relevant aspects of five distinct phases of this process: creation of curated data sets, molecular representations, data-driven property prediction, generation of new molecules, and feasibility and synthesizability considerations.
Collapse
|
29
|
Abstract
Fragment-based drug design has introduced a bottom-up process for drug development, with improved sampling of chemical space and increased effectiveness in early drug discovery. Here, we combine the use of pharmacophores, the most general concept of representing drug-target interactions with the theory of protein hotspots, to develop a design protocol for fragment libraries. The SpotXplorer approach compiles small fragment libraries that maximize the coverage of experimentally confirmed binding pharmacophores at the most preferred hotspots. The efficiency of this approach is demonstrated with a pilot library of 96 fragment-sized compounds (SpotXplorer0) that is validated on popular target classes and emerging drug targets. Biochemical screening against a set of GPCRs and proteases retrieves compounds containing an average of 70% of known pharmacophores for these targets. More importantly, SpotXplorer0 screening identifies confirmed hits against recently established challenging targets such as the histone methyltransferase SETD2, the main protease (3CLPro) and the NSP3 macrodomain of SARS-CoV-2.
Collapse
|
30
|
Forman persistent Ricci curvature (FPRC)-based machine learning models for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6262241. [PMID: 33940588 DOI: 10.1093/bib/bbab136] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 03/14/2021] [Accepted: 03/23/2021] [Indexed: 01/01/2023] Open
Abstract
Artificial intelligence (AI) techniques have already been gradually applied to the entire drug design process, from target discovery, lead discovery, lead optimization and preclinical development to the final three phases of clinical trials. Currently, one of the central challenges for AI-based drug design is molecular featurization, which is to identify or design appropriate molecular descriptors or fingerprints. Efficient and transferable molecular descriptors are key to the success of all AI-based drug design models. Here we propose Forman persistent Ricci curvature (FPRC)-based molecular featurization and feature engineering, for the first time. Molecular structures and interactions are modeled as simplicial complexes, which are generalization of graphs to their higher dimensional counterparts. Further, a multiscale representation is achieved through a filtration process, during which a series of nested simplicial complexes at different scales are generated. Forman Ricci curvatures (FRCs) are calculated on the series of simplicial complexes, and the persistence and variation of FRCs during the filtration process is defined as FPRC. Moreover, persistent attributes, which are FPRC-based functions and properties, are employed as molecular descriptors, and combined with machine learning models, in particular, gradient boosting tree (GBT). Our FPRC-GBT models are extensively trained and tested on three most commonly-used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. It has been found that our results are better than the ones from machine learning models with traditional molecular descriptors.
Collapse
|
31
|
Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction. Brief Bioinform 2021; 22:6219114. [PMID: 33837771 DOI: 10.1093/bib/bbab127] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 12/21/2022] Open
Abstract
Molecular descriptors are essential to not only quantitative structure activity/property relationship (QSAR/QSPR) models, but also machine learning based chemical and biological data analysis. In this paper, we propose persistent spectral hypergraph (PSH) based molecular descriptors or fingerprints for the first time. Our PSH-based molecular descriptors are used in the characterization of molecular structures and interactions, and further combined with machine learning models, in particular gradient boosting tree (GBT), for protein-ligand binding affinity prediction. Different from traditional molecular descriptors, which are usually based on molecular graph models, a hypergraph-based topological representation is proposed for protein-ligand interaction characterization. Moreover, a filtration process is introduced to generate a series of nested hypergraphs in different scales. For each of these hypergraphs, its eigen spectrum information can be obtained from the corresponding (Hodge) Laplacain matrix. PSH studies the persistence and variation of the eigen spectrum of the nested hypergraphs during the filtration process. Molecular descriptors or fingerprints can be generated from persistent attributes, which are statistical or combinatorial functions of PSH, and combined with machine learning models, in particular, GBT. We test our PSH-GBT model on three most commonly used datasets, including PDBbind-2007, PDBbind-2013 and PDBbind-2016. Our results, for all these databases, are better than all existing machine learning models with traditional molecular descriptors, as far as we know.
Collapse
|
32
|
Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00301-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
33
|
Comparison of Cellular Morphological Descriptors and Molecular Fingerprints for the Prediction of Cytotoxicity- and Proliferation-Related Assays. Chem Res Toxicol 2021; 34:422-437. [PMID: 33522793 DOI: 10.1021/acs.chemrestox.0c00303] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Cell morphology features, such as those from the Cell Painting assay, can be generated at relatively low costs and represent versatile biological descriptors of a system and thereby compound response. In this study, we explored cell morphology descriptors and molecular fingerprints, separately and in combination, for the prediction of cytotoxicity- and proliferation-related in vitro assay endpoints. We selected 135 compounds from the MoleculeNet ToxCast benchmark data set which were annotated with Cell Painting readouts, where the relatively small size of the data set is due to the overlap of required annotations. We trained Random Forest classification models using nested cross-validation and Cell Painting descriptors, Morgan and ErG fingerprints, and their combinations. While using leave-one-cluster-out cross-validation (with clusters based on physicochemical descriptors), models using Cell Painting descriptors achieved higher average performance over all assays (Balanced Accuracy of 0.65, Matthews Correlation Coefficient of 0.28, and AUC-ROC of 0.71) compared to models using ErG fingerprints (BA 0.55, MCC 0.09, and AUC-ROC 0.60) and Morgan fingerprints alone (BA 0.54, MCC 0.06, and AUC-ROC 0.56). While using random shuffle splits, the combination of Cell Painting descriptors with ErG and Morgan fingerprints further improved balanced accuracy on average by 8.9% (in 9 out of 12 assays) and 23.4% (in 8 out of 12 assays) compared to using only ErG and Morgan fingerprints, respectively. Regarding feature importance, Cell Painting descriptors related to nuclei texture, granularity of cells, and cytoplasm as well as cell neighbors and radial distributions were identified to be most contributing, which is plausible given the endpoint considered. We conclude that cell morphological descriptors contain complementary information to molecular fingerprints which can be used to improve the performance of predictive cytotoxicity models, in particular in areas of novel structural space.
Collapse
|
34
|
Abstract
Molecular descriptors encode a variety of molecular representations for computer-assisted drug discovery. Here, we focus on the Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors, which were originally designed for scaffold hopping from natural products to synthetic molecules. WHALES descriptors capture molecular shape and partial charges simultaneously. We introduce the key aspects of the WHALES concept and provide a step-by-step guide on how to use these descriptors for virtual compound screening and scaffold hopping. The results presented can be reproduced by using the code freely available from URL: github.com/ETHmodlab/scaffold_hopping_whales .
Collapse
|
35
|
Abstract
Molecular scaffolds are widely used in drug design. Many methods and tools have been developed to utilize the information in scaffolds. Scaffold diversification is frequently used by medicinal chemists in tasks such as lead compound optimization, but tools for scaffold diversification are still lacking. Here, we propose AIScaffold (https://iaidrug.stonewise.cn), a web-based tool for scaffold diversification using the deep generative model. This tool can perform large-scale (up to 500,000 molecules) diversification in several minutes and recommend the top 500 (top 0.1%) molecules. Features such as site-specific diversification are also supported. This tool can facilitate the scaffold diversification process for medicinal chemists, thereby accelerating drug design.
Collapse
|
36
|
Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening. Curr Top Med Chem 2020; 20:1582-1592. [PMID: 32493194 PMCID: PMC7536799 DOI: 10.2174/1568026620666200603122000] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 11/19/2019] [Accepted: 12/07/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. OBJECTIVE This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. METHODS Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. RESULTS In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. CONCLUSION This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.
Collapse
|
37
|
Abstract
Recently, molecular fingerprints extracted from three-dimensional (3D) structures using advanced mathematics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machine learning, especially deep learning algorithms to outperform other methods in drug discovery applications and competitions. This raises the question of whether classical 2D fingerprints are still valuable in computer-aided drug discovery. This work considers 23 datasets associated with four typical problems, namely protein-ligand binding, toxicity, solubility and partition coefficient to assess the performance of eight 2D fingerprints. Advanced machine learning algorithms including random forest, gradient boosted decision tree, single-task deep neural network and multitask deep neural network are employed to construct efficient 2D-fingerprint based models. Additionally, appropriate consensus models are built to further enhance the performance of 2D-fingerprint-based methods. It is demonstrated that 2D-fingerprint-based models perform as well as the state-of-the-art 3D structure-based models for the predictions of toxicity, solubility, partition coefficient and protein-ligand binding affinity based on only ligand information. However, 3D structure-based models outperform 2D fingerprint-based methods in complex-based protein-ligand binding affinity predictions.
Collapse
|
38
|
|
39
|
Abstract
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.
Collapse
|
40
|
Abstract
![]()
Extended
reduced graphs provide summary representations of chemical
structures using pharmacophore-type node descriptions to encode the
relevant molecular properties. Commonly used similarity measures using
reduced graphs convert these graphs into 2D vectors like fingerprints,
before chemical comparisons are made. This study investigates the
effectiveness of a graph-only driven molecular comparison by using
extended reduced graphs along with graph edit distance methods for
molecular similarity calculation as a tool for ligand-based virtual
screening applications, which estimate the bioactivity of a chemical
on the basis of the bioactivity of similar compounds. The results
proved to be very stable and the graph editing distance method performed
better than other methods previously used on reduced graphs. This
is exemplified with six publicly available data sets: DUD-E, MUV,
GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. The screening and statistical
tools available on the ligand-based virtual screening benchmarking
platform and the RDKit were also used. In the experiments, our method
performed better than other molecular similarity methods which use
array representations in most cases. Overall, it is shown that extended
reduced graphs along with graph edit distance is a combination of
methods that has numerous applications and can identify bioactivity
similarities in a structurally diverse group of molecules.
Collapse
|
41
|
|
42
|
Understanding Molecular Drivers of Melanin Binding To Support Rational Design of Small Molecule Ophthalmic Drugs. J Med Chem 2018; 61:10106-10115. [PMID: 30398862 DOI: 10.1021/acs.jmedchem.8b01281] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Binding of drugs to ocular melanin is a prominent biological phenomenon that affects the local pharmacokinetics and pharmacodynamics in the eye. In this work, we report on the development of in vitro and in silico tools for an early assessment and prediction of melanin binding properties of small molecules. A robust high-throughput assay has been established to study the binding of large sets of compounds to melanin. The extremely randomized trees approach was used to develop an in silico model able to predict the extent of melanin binding from the molecular properties of the compounds. After the last iteration of the model, strong melanin binders could prospectively be identified with 91% accuracy. On the basis of in vitro data generated for approximately 3400 chemically diverse drug-like small molecules, pronounced correlations were observed between the extent of melanin binding and the basicity, lipophilicity, and aromaticity of the compounds.
Collapse
|
43
|
Applying machine learning techniques to predict the properties of energetic materials. Sci Rep 2018; 8:9059. [PMID: 29899464 PMCID: PMC5998124 DOI: 10.1038/s41598-018-27344-x] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 06/01/2018] [Indexed: 11/23/2022] Open
Abstract
We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with ≈300 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.
Collapse
|
44
|
DeCAF-Discrimination, Comparison, Alignment Tool for 2D PHarmacophores. Molecules 2017; 22:E1128. [PMID: 28684712 PMCID: PMC6152008 DOI: 10.3390/molecules22071128] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 06/30/2017] [Accepted: 07/04/2017] [Indexed: 01/24/2023] Open
Abstract
Comparison of small molecules is a common component of many cheminformatics workflows, including the design of new compounds and libraries as well as side-effect predictions and drug repurposing. Currently, large-scale comparison methods rely mostly on simple fingerprint representation of molecules, which take into account the structural similarities of compounds. Methods that utilize 3D information depend on multiple conformer generation steps, which are computationally expensive and can greatly influence their results. The aim of this study was to augment molecule representation with spatial and physicochemical properties while simultaneously avoiding conformer generation. To achieve this goal, we describe a molecule as an undirected graph in which the nodes correspond to atoms with pharmacophoric properties and the edges of the graph represent the distances between features. This approach combines the benefits of a conformation-free representation of a molecule with additional spatial information. We implemented our approach as an open-source Python module called DeCAF (Discrimination, Comparison, Alignment tool for 2D PHarmacophores), freely available at http://bitbucket.org/marta-sd/decaf. We show DeCAF's strengths and weaknesses with usage examples and thorough statistical evaluation. Additionally, we show that our method can be manually tweaked to further improve the results for specific tasks. The full dataset on which DeCAF was evaluated and all scripts used to calculate and analyze the results are also provided.
Collapse
|
45
|
Chemoinformatics at the University of Sheffield 2002-2014. Mol Inform 2016; 34:598-607. [PMID: 27490711 DOI: 10.1002/minf.201500004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 03/13/2015] [Indexed: 11/09/2022]
Abstract
This paper summarises work in chemoinformatics carried out in the Information School of the University of Sheffield during the period 2002-2014. Research studies are described on fingerprint-based similarity searching, data fusion, applications of reduced graphs and pharmacophore mapping, and on the School's teaching in chemoinformatics.
Collapse
|
46
|
SwissSimilarity: A Web Tool for Low to Ultra High Throughput Ligand-Based Virtual Screening. J Chem Inf Model 2016; 56:1399-404. [PMID: 27391578 DOI: 10.1021/acs.jcim.6b00174] [Citation(s) in RCA: 181] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
SwissSimilarity is a new web tool for rapid ligand-based virtual screening of small to unprecedented ultralarge libraries of small molecules. Screenable compounds include drugs, bioactive and commercial molecules, as well as 205 million of virtual compounds readily synthesizable from commercially available synthetic reagents. Predictions can be carried out on-the-fly using six different screening approaches, including 2D molecular fingerprints as well as superpositional and fast nonsuperpositional 3D similarity methodologies. SwissSimilarity is part of a large initiative of the SIB Swiss Institute of Bioinformatics to provide online tools for computer-aided drug design, such as SwissDock, SwissBioisostere or SwissTargetPrediction with which it can interoperate, and is linked to other well-established online tools and databases. User interface and backend have been designed for simplicity and ease of use, to provide proficient virtual screening capabilities to specialists and nonexperts in the field. SwissSimilarity is accessible free of charge or login at http://www.swisssimilarity.ch .
Collapse
|
47
|
inSARa: intuitive and interactive SAR interpretation by reduced graphs and hierarchical MCS-based network navigation. J Chem Inf Model 2014; 54:1578-95. [PMID: 24850242 DOI: 10.1021/ci4007547] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The analysis of Structure-Activity-Relationships (SAR) of small molecules is a fundamental task in drug discovery. Although a large number of methods are already published, there is still a strong need for novel intuitive approaches. The inSARa (intuitive networks for Structure-Activity Relationships analysis) method introduced herein takes advantage of the synergistic combination of reduced graphs (RG) and the intuitive maximum common substructure (MCS) concept. The main feature of the inSARa concept is a hierarchical network structure of clearly defined substructure relationships based on common pharmacophoric features. Thus, straightforward SAR interpretation is possible by interactive network navigation. When focusing on a set of active molecules at one single target, the resulting inSARa networks are shown to be valuable for various essential tasks in SAR analysis, such as the identification of activity cliffs or "activity switches", bioisosteric exchanges, common pharmacophoric features, or "SAR hotspots".
Collapse
|
48
|
Abstract
The measurement of molecular similarity is an essential part of various machine learning tasks in chemical informatics. Graph kernels provide good similarity measures between molecules. Conventional graph kernels are based on counting common subgraphs of specific types in the molecular graphs. This approach has two primary limitations: (i) only exact subgraph matching is considered in the counting operation, and (ii) most of the subgraphs will be less relevant to a given task. In order to address the above-mentioned limitations, we propose a new graph kernel as an extension of the subtree kernel initially proposed by Ramon and Gärtner (2003). The proposed kernel tolerates an inexact match between subgraphs by allowing matching between atoms with similar local environments. In addition, the proposed kernel provides a method to assign an importance weight to each subgraph according to the relevance to the task, which is predetermined by a statistical test. These extensions are evaluated for classification and regression tasks of predicting a wide range of pharmaceutical properties from molecular structures, with promising results.
Collapse
|
49
|
'Fuzziness' in pharmacophore-based virtual screening and de novo design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2013; 7:e203-70. [PMID: 24103799 DOI: 10.1016/j.ddtec.2010.10.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Discovery of Pteridin-7(8H)-one-Based Irreversible Inhibitors Targeting the Epidermal Growth Factor Receptor (EGFR) Kinase T790M/L858R Mutant. J Med Chem 2013; 56:7821-37. [DOI: 10.1021/jm401045n] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|