1
|
Vittorio S, Lunghini F, Morerio P, Gadioli D, Orlandini S, Silva P, Jan Martinovic, Pedretti A, Bonanni D, Del Bue A, Palermo G, Vistoli G, Beccari AR. Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024; 23:2141-2151. [PMID: 38827235 PMCID: PMC11141151 DOI: 10.1016/j.csbj.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/04/2024] Open
Abstract
Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented.
Collapse
Affiliation(s)
- Serena Vittorio
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Filippo Lunghini
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| | - Pietro Morerio
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Davide Gadioli
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Sergio Orlandini
- SCAI, SuperComputing Applications and Innovation Department, CINECA, Via dei Tizii 6, Rome 00185, Italy
| | - Paulo Silva
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Jan Martinovic
- IT4Innovations, VSB – Technical University of Ostrava, 17. listopadu 2172/15, 70800 Ostrava-Poruba, Czech Republic
| | - Alessandro Pedretti
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Domenico Bonanni
- Department of Physical and Chemical Sciences, University of L′Aquila, via Vetoio, L′Aquila 67010, Italy
| | - Alessio Del Bue
- Pattern Analysis and Computer Vision, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30, 16163 Genova, Italy
| | - Gianluca Palermo
- Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, I-20133 Milano, Italy
| | - Giulio Vistoli
- Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Via Luigi Mangiagalli 25, I-20133 Milano, Italy
| | - Andrea R. Beccari
- EXSCALATE, Dompé Farmaceutici SpA, Via Tommaso de Amicis 95, 80123 Naples, Italy
| |
Collapse
|
2
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
3
|
Choi S, Lee J, Seo J, Han SW, Lee SH, Seo JH, Seok J. Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules. Sci Data 2024; 11:371. [PMID: 38605036 PMCID: PMC11009387 DOI: 10.1038/s41597-024-03212-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/02/2024] [Indexed: 04/13/2024] Open
Abstract
The simplified molecular-input line-entry system (SMILES) has been utilized in a variety of artificial intelligence analyses owing to its capability of representing chemical structures using line notation. However, its ease of representation is limited, which has led to the proposal of BigSMILES as an alternative method suitable for the representation of macromolecules. Nevertheless, research on BigSMILES remains limited due to its preprocessing requirements. Thus, this study proposes a conversion workflow of BigSMILES, focusing on its automated generation from SMILES representations of homopolymers. BigSMILES representations for 4,927,181 records are provided, thereby enabling its immediate use for various research and development applications. Our study presents detailed descriptions on a validation process to ensure the accuracy, interchangeability, and robustness of the conversion. Additionally, a systematic overview of utilized codes and functions that emphasizes their relevance in the context of BigSMILES generation are produced. This advancement is anticipated to significantly aid researchers and facilitate further studies in BigSMILES representation, including potential applications in deep learning and further extension to complex structures such as copolymers.
Collapse
Affiliation(s)
- Sunho Choi
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Joonbum Lee
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Jangwon Seo
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Sung Won Han
- School of Industrial Management Engineering, Korea University, Seoul, South Korea
| | - Sang Hyun Lee
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Ji-Hun Seo
- Department of Materials Science and Engineering, Korea University, Seoul, South Korea
| | - Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea.
| |
Collapse
|
4
|
Daza D, Alivanistos D, Mitra P, Pijnenburg T, Cochez M, Groth P. BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs. J Biomed Semantics 2023; 14:20. [PMID: 38066573 PMCID: PMC10709903 DOI: 10.1186/s13326-023-00301-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. OBJECTIVE We aim to understand how to incorporate multimodal data into biomedical KG embeddings, and analyze the resulting performance in comparison with traditional methods. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. RESULTS In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. Further analyses show that incorporating attribute data does outperform baselines over entities below a certain node degree, comprising approximately 75% of the diseases in the graph. We also observe that optimizing attribute encoders is a challenging task that increases optimization costs. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. CONCLUSION BioBLP allows to investigate different ways of incorporating multimodal biomedical data for learning representations in KGs. With a particular implementation, we find that incorporating attribute data does not consistently outperform baselines, but improvements are obtained on a comparatively large subset of entities below a specific node-degree. Our results indicate a potential for improved performance in scientific discovery tasks where understudied areas of the KG would benefit from link prediction methods.
Collapse
Affiliation(s)
- Daniel Daza
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- University of Amsterdam, Amsterdam, The Netherlands.
- Discovery Lab, Elsevier, Amsterdam, The Netherlands.
| | - Dimitrios Alivanistos
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- Discovery Lab, Elsevier, Amsterdam, The Netherlands.
| | | | | | - Michael Cochez
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Paul Groth
- University of Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Bao H, Zhao J, Zhao X, Zhao C, Lu X, Xu G. Prediction of plant secondary metabolic pathways using deep transfer learning. BMC Bioinformatics 2023; 24:348. [PMID: 37726702 PMCID: PMC10507959 DOI: 10.1186/s12859-023-05485-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 09/14/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND Plant secondary metabolites are highly valued for their applications in pharmaceuticals, nutrition, flavors, and aesthetics. It is of great importance to elucidate plant secondary metabolic pathways due to their crucial roles in biological processes during plant growth and development. However, understanding plant biosynthesis and degradation pathways remains a challenge due to the lack of sufficient information in current databases. To address this issue, we proposed a transfer learning approach using a pre-trained hybrid deep learning architecture that combines Graph Transformer and convolutional neural network (GTC) to predict plant metabolic pathways. RESULTS GTC provides comprehensive molecular representation by extracting both structural features from the molecular graph and textual information from the SMILES string. GTC is pre-trained on the KEGG datasets to acquire general features, followed by fine-tuning on plant-derived datasets. Four metrics were chosen for model performance evaluation. The results show that GTC outperforms six other models, including three previously reported machine learning models, on the KEGG dataset. GTC yields an accuracy of 96.75%, precision of 85.14%, recall of 83.03%, and F1_score of 84.06%. Furthermore, an ablation study confirms the indispensability of all the components of the hybrid GTC model. Transfer learning is then employed to leverage the shared knowledge acquired from the KEGG metabolic pathways. As a result, the transferred GTC exhibits outstanding accuracy in predicting plant secondary metabolic pathways with an average accuracy of 98.30% in fivefold cross-validation and 97.82% on the final test. In addition, GTC is employed to classify natural products. It achieves a perfect accuracy score of 100.00% for alkaloids, while the lowest accuracy score of 98.42% for shikimates and phenylpropanoids. CONCLUSIONS The proposed GTC effectively captures molecular features, and achieves high performance in classifying KEGG metabolic pathways and predicting plant secondary metabolic pathways via transfer learning. Furthermore, GTC demonstrates its generalization ability by accurately classifying natural products. A user-friendly executable program has been developed, which only requires the input of the SMILES string of the query compound in a graphical interface.
Collapse
Affiliation(s)
- Han Bao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Jinhui Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Chunxia Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China.
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China.
| |
Collapse
|
6
|
Tran T, Ekenna C. Molecular Descriptors Property Prediction Using Transformer-Based Approach. Int J Mol Sci 2023; 24:11948. [PMID: 37569322 PMCID: PMC10419034 DOI: 10.3390/ijms241511948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 07/21/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models.
Collapse
|
7
|
Ramírez-Palacios C, Marrink SJ. Super High-Throughput Screening of Enzyme Variants by Spectral Graph Convolutional Neural Networks. J Chem Theory Comput 2023. [PMID: 36961994 PMCID: PMC10373491 DOI: 10.1021/acs.jctc.2c01227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
Finding new enzyme variants with the desired substrate scope requires screening through a large number of potential variants. In a typical in silico enzyme engineering workflow, it is possible to scan a few thousands of variants, and gather several candidates for further screening or experimental verification. In this work, we show that a Graph Convolutional Neural Network (GCN) can be trained to predict the binding energy of combinatorial libraries of enzyme complexes using only sequence information. The GCN model uses a stack of message-passing and graph pooling layers to extract information from the protein input graph and yield a prediction. The GCN model is agnostic to the identity of the ligand, which is kept constant within the mutant libraries. Using a miniscule subset of the total combinatorial space (204-208 mutants) as training data, the proposed GCN model achieves a high accuracy in predicting the binding energy of unseen variants. The network's accuracy was further improved by injecting feature embeddings obtained from a language module pretrained on 10 million protein sequences. Since no structural information is needed to evaluate new variants, the deep learning algorithm is capable of scoring an enzyme variant in under 1 ms, allowing the search of billions of candidates on a single GPU.
Collapse
Affiliation(s)
- Carlos Ramírez-Palacios
- Molecular Dynamics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| | - Siewert J Marrink
- Molecular Dynamics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Nijenborgh 7, 9747 AG Groningen, The Netherlands
| |
Collapse
|
8
|
Baptista D, Ferreira PG, Rocha M. A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer. PLoS Comput Biol 2023; 19:e1010200. [PMID: 36952569 PMCID: PMC10072473 DOI: 10.1371/journal.pcbi.1010200] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 04/04/2023] [Accepted: 02/08/2023] [Indexed: 03/25/2023] Open
Abstract
One of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact-limiting gene expression data to cancer or drug response-specific genes improved performance. Drug features appeared to be more predictive of drug response, with a 41% increase in coefficient of determination (R2) and 26% increase in Spearman correlation relative to a baseline model that used only cell line and drug identifiers. Molecular fingerprint-based drug representations performed slightly better than learned representations-ECFP4 fingerprints increased R2 by 5.3% and Spearman correlation by 2.8% w.r.t the best learned representations. In general, fully connected feature-encoding subnetworks outperformed other architectures. DL outperformed other ML methods by more than 35% (R2) and 14% (Spearman). Additionally, an ensemble combining the top DL and ML models improved performance by about 6.5% (R2) and 4% (Spearman). Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.
Collapse
Affiliation(s)
- Delora Baptista
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Guimarães, Portugal
| | - Pedro G Ferreira
- Department of Computer Science, Faculty of Sciences, University of Porto, Porto, Portugal
- INESC TEC, Porto, Portugal
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- i3s - Instituto de Investigação e Inovação em Saúde da Universidade do Porto, Porto, Portugal
| | - Miguel Rocha
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Guimarães, Portugal
| |
Collapse
|
9
|
Accurate predictions of drugs aqueous solubility via deep learning tools. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2021.131562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
10
|
Deep Learning-Assisted Repurposing of Plant Compounds for Treating Vascular Calcification: An In Silico Study with Experimental Validation. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2022; 2022:4378413. [PMID: 35035662 PMCID: PMC8754599 DOI: 10.1155/2022/4378413] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 10/24/2021] [Accepted: 11/13/2021] [Indexed: 12/13/2022]
Abstract
Background Vascular calcification (VC) constitutes subclinical vascular burden and increases cardiovascular mortality. Effective therapeutics for VC remains to be procured. We aimed to use a deep learning-based strategy to screen and uncover plant compounds that potentially can be repurposed for managing VC. Methods We integrated drugome, interactome, and diseasome information from Comparative Toxicogenomic Database (CTD), DrugBank, PubChem, Gene Ontology (GO), and BioGrid to analyze drug-disease associations. A deep representation learning was done using a high-level description of the local network architecture and features of the entities, followed by learning the global embeddings of nodes derived from a heterogeneous network using the graph neural network architecture and a random forest classifier established for prediction. Predicted results were tested in an in vitro VC model for validity based on the probability scores. Results We collected 6,790 compounds with available Simplified Molecular-Input Line-Entry System (SMILES) data, 11,958 GO terms, 7,238 diseases, and 25,482 proteins, followed by local embedding vectors using an end-to-end transformer network and a node2vec algorithm and global embedding vectors learned from heterogeneous network via the graph neural network. Our algorithm conferred a good distinction between potential compounds, presenting as higher prediction scores for the compound categories with a higher potential but lower scores for other categories. Probability score-dependent selection revealed that antioxidants such as sulforaphane and daidzein were potentially effective compounds against VC, while catechin had low probability. All three compounds were validated in vitro. Conclusions Our findings exemplify the utility of deep learning in identifying promising VC-treating plant compounds. Our model can be a quick and comprehensive computational screening tool to assist in the early drug discovery process.
Collapse
|
11
|
|
12
|
Affiliation(s)
- W Patrick Walters
- Relay Therapeutics, 399 Binney Street, Cambridge, Massachusetts 02139, United States
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
13
|
Gao P, Zhang J, Sun Y, Yu J. Accurate predictions of aqueous solubility of drug molecules via the multilevel graph convolutional network (MGCN) and SchNet architectures. Phys Chem Chem Phys 2020; 22:23766-23772. [PMID: 33063077 DOI: 10.1039/d0cp03596c] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.
Collapse
Affiliation(s)
- Peng Gao
- School of Chemistry and Molecular Bioscience, University of Wollongong, NSW 2500, Australia
| | | | | | | |
Collapse
|