1
|
Yang Y, An D, Wang Y, Zou W, Cui G, Tong J, Feng K, Jing T, Wang L, Shi L, Li C. Wee1 inhibitor optimization through deep-learning-driven decision making. Eur J Med Chem 2024; 280:116912. [PMID: 39369485 DOI: 10.1016/j.ejmech.2024.116912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 09/22/2024] [Accepted: 09/23/2024] [Indexed: 10/08/2024]
Abstract
Deep learning has gained increasing attention in recent years, yielding promising results in hit screening and molecular optimization. Herein, we employed an efficient strategy based on multiple deep learning techniques to optimize Wee1 inhibitors, which involves activity interpretation, scaffold-based molecular generation, and activity prediction. Starting from our in-house Wee1 inhibitor GLX0198 (IC50 = 157.9 nM), we obtained three optimized compounds (IC50 = 13.5 nM, 33.7 nM, and 47.1 nM) out of five picked molecules. Further minor modifications on these compounds led to the identification of potent Wee1 inhibitors with desirable inhibitory effects on multiple cancer cell lines. Notably, the best compound 13 exhibited superior cancer cell inhibition, with IC50 values below 100 nM in all tested cancer cells. These results suggest that deep learning can greatly facilitate decision-making at the stage of molecular optimization.
Collapse
Affiliation(s)
| | - Duo An
- Galixir, Beijing, 100080, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Si Z, Liu D, Nie W, Hu J, Wang C, Jiang T, Yu H, Fu Y. Data-Based Prediction of Redox Potentials via Introducing Chemical Features into the Transformer Architecture. J Chem Inf Model 2024; 64:8453-8463. [PMID: 39513760 DOI: 10.1021/acs.jcim.4c01299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
Rapid and accurate prediction of basic physicochemical parameters of molecules will greatly accelerate the target-orientated design of novel reactions and materials but has been long challenging. Herein, a chemical language model-based deep learning method, TransChem, has been developed for the prediction of redox potentials of organic molecules. Embedding an effective molecular characterization (combining spatial and electronic features), a nonlinear molecular messaging approach (Mol-Attention), and a perturbation learning method, TransChem, shows high accuracy in predicting the redox potential of organic radicals comprising over 100,000 data (R2 > 0.97, MAE <0.09 V) and is generalized to the smaller 2,1,3-benzothiadiazole data set (<3000 data points) and electron affinity data set (660 data) with low MAE of 0.07 V and 0.18 eV, respectively. In this context, a self-developed data set, i.e., the oxidation potential (OP) of a full-space disubstituted phenol data set (OPP-data set, total set: 74,529), has been predicted by TransChem with a high-throughput, and active learning strategy. The rapid and reliable prediction of OP could hopefully accelerate the screening of plausible reagents in highly selective cross-coupling of phenol derivatives. This study presents an important attempt to guide language modeling with chemical knowledge, while TransChem demonstrates state-of-the-art (SOTA) predictive performance on redox potential prediction benchmark data sets for its better understanding of molecular design and conformational relationships.
Collapse
Affiliation(s)
- Zhan Si
- Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui Province Key Laboratory of Chemistry for Inorganic/Organic Hybrid Functionalized Materials, Anhui University, Hefei 230601, China
| | - Deguang Liu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, University of Science and Technology of China, Hefei 230026, China
| | - Wan Nie
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Jingjing Hu
- Department of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
| | - Chen Wang
- Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui Province Key Laboratory of Chemistry for Inorganic/Organic Hybrid Functionalized Materials, Anhui University, Hefei 230601, China
| | - Tingting Jiang
- Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui Province Key Laboratory of Chemistry for Inorganic/Organic Hybrid Functionalized Materials, Anhui University, Hefei 230601, China
| | - Haizhu Yu
- Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui Province Key Laboratory of Chemistry for Inorganic/Organic Hybrid Functionalized Materials, Anhui University, Hefei 230601, China
| | - Yao Fu
- Key Laboratory of Precision and Intelligent Chemistry, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, University of Science and Technology of China, Hefei 230026, China
| |
Collapse
|
3
|
Shaban Tameh M, Coropceanu V, Purcell TAR, Brédas JL. Prediction of the Infrared Absorbance Intensities and Frequencies of Hydrocarbons: A Message Passing Neural Network Approach. J Phys Chem A 2024; 128:9695-9706. [PMID: 39466724 DOI: 10.1021/acs.jpca.4c06745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Accurately and efficiently predicting the infrared (IR) spectra of a molecule can provide insights into the structure-properties relationships of molecular species, which has led to a proliferation of machine learning tools designed for this purpose. However, earlier studies have focused primarily on obtaining normalized IR spectra, which limits their potential for a comprehensive analysis of molecular behavior in the IR range. For instance, to fully understand and predict the optical properties, such as the transparency characteristics, it is necessary to predict the molar absorptivity IR spectra instead. Here, we propose a graph-based communicative message passing neural network algorithm that can predict both the peak positions and absolute intensities corresponding to density functional theory calculated molar absorptivities in the IR domain. By modifying existing spectral loss functions, we show that our method is able to predict with DFT-accuracy level the IR molar absorptivities of a series of hydrocarbons containing up to ten carbon atoms and apply the model to a set of larger molecules. We also compare the predicted spectra with those generated by the direct message passing neural network. The results suggest that both algorithms demonstrate similar predictive capabilities for hydrocarbons, indicating that either model could be effectively used in future research on spectral prediction for such systems.
Collapse
Affiliation(s)
- Maliheh Shaban Tameh
- Department of Chemistry and Biochemistry, The University of Arizona, Tucson, Arizona 85721-0041, United States
| | - Veaceslav Coropceanu
- Department of Chemistry and Biochemistry, The University of Arizona, Tucson, Arizona 85721-0041, United States
| | - Thomas A R Purcell
- Department of Chemistry and Biochemistry, The University of Arizona, Tucson, Arizona 85721-0041, United States
| | - Jean-Luc Brédas
- Department of Chemistry and Biochemistry, The University of Arizona, Tucson, Arizona 85721-0041, United States
| |
Collapse
|
4
|
Srivastava P, Steuer A, Ferri F, Nicoli A, Schultz K, Bej S, Di Pizio A, Wolkenhauer O. Bitter peptide prediction using graph neural networks. J Cheminform 2024; 16:111. [PMID: 39375808 PMCID: PMC11459932 DOI: 10.1186/s13321-024-00909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/22/2024] [Indexed: 10/09/2024] Open
Abstract
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification of these peptides. This paper presents BitterPep-GCN, a feature-agnostic graph convolution network for bitter peptide prediction. The graph-based model learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. BitterPep-GCN was benchmarked using BTP640, a publicly available bitter peptide dataset. The latent peptide embeddings generated by the trained model were used to analyze the activity of sequence motifs responsible for the bitter taste of the peptides. Particularly, we calculated the activity for individual amino acids and dipeptide, tripeptide, and tetrapeptide sequence motifs present in the peptides. Our analyses pinpoint specific amino acids, such as F, G, P, and R, as well as sequence motifs, notably tripeptide and tetrapeptide motifs containing FF, as key bitter signatures in peptides. This work not only provides a new predictor of bitter taste for a more efficient identification of bitter peptides in various food products but also gives a hint into the molecular basis of bitterness.Scientific ContributionOur work provides the first application of Graph Neural Networks for the prediction of peptide bitter taste. The best-developed model, BitterPep-GCN, learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. The embeddings were used to analyze the sequence motifs responsible for the bitter taste.
Collapse
Affiliation(s)
- Prashant Srivastava
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Alexandra Steuer
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Francesco Ferri
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Alessandro Nicoli
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Kristian Schultz
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Saptarshi Bej
- Indian Institute of Science Education and Research Thiruvananthapuram, Maruthamala P. O, Vithura, 695551, Kerala, India
| | - Antonella Di Pizio
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| | - Olaf Wolkenhauer
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany.
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
5
|
Zhang Q, Yuan Y, Zhang J, Fang P, Pan J, Zhang H, Zhou T, Yu Q, Zou X, Sun Z, Yan F. Machine Learning-Aided Design of Highly Conductive Anion Exchange Membranes for Fuel Cells and Water Electrolyzers. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2404981. [PMID: 39075826 DOI: 10.1002/adma.202404981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 07/22/2024] [Indexed: 07/31/2024]
Abstract
Alkaline anion exchange membrane (AEM)-based fuel cells (AEMFCs) and water electrolyzers (AEMWEs) are vital for enabling the efficient and large-scale utilization of hydrogen energy. However, the performance of such energy devices is impeded by the relatively low conductivity of AEMs. The conventional trial-and-error approach to designing membrane structures has proven to be both inefficient and costly. To address this challenge, a fully connected neural network (FCNN) model is developed based on acid-catalyzed AEMs to analyze the relationship between structure and conductivity among 180,000 AEM variations. Under machine learning guidance, anilinium cation-type membranes are designed and synthesized. Molecular dynamics simulations and Mulliken charge population analysis validated that the presence of a large anilinium cation domain is a result of the inductive effect of N+ and benzene rings. The interconnected anilinium cation domains facilitated the formation of a continuous ion transport channel within the AEMs. Additionally, the incorporation of the benzyl electron-withdrawing group heightened the inductive effect, leading to high conductivity AEM variant as screened by the machine learning model. Furthermore, based on the highly active and low-cost monomers given by machine learning, the large-scale synthesis of anilinium-based AEMs confirms the potential for commercial applications.
Collapse
Affiliation(s)
- Qiuhuan Zhang
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Yongjiang Yuan
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Jiale Zhang
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Pengda Fang
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Ji Pan
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Hao Zhang
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Tao Zhou
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Qikun Yu
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Xiuyang Zou
- Jiangsu Engineering Research Center for Environmental Functional Materials, School of Chemistry and Chemical Engineering Huaiyin Normal University, Huaian, 223300, China
| | - Zhe Sun
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
| | - Feng Yan
- Jiangsu Engineering Laboratory of Novel Functional Polymeric Materials, Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Suzhou Key Laboratory of Soft Material and New Energy, College of Chemistry, Chemical Engineering and Materials Science, Soochow University, Suzhou, 215123, China
- State Key Laboratory for Modification of Chemical Fibers and Polymer Materials, College of Materials Science and Engineering, Donghua University, Shanghai, 201600, China
| |
Collapse
|
6
|
Tran TTV, Tayara H, Chong KT. AMPred-CNN: Ames mutagenicity prediction model based on convolutional neural networks. Comput Biol Med 2024; 176:108560. [PMID: 38754218 DOI: 10.1016/j.compbiomed.2024.108560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/15/2024] [Accepted: 05/05/2024] [Indexed: 05/18/2024]
Abstract
Mutagenicity assessment plays a pivotal role in the safety evaluation of chemicals, pharmaceuticals, and environmental compounds. In recent years, the development of robust computational models for predicting chemical mutagenicity has gained significant attention, driven by the need for efficient and cost-effective toxicity assessments. In this paper, we proposed AMPred-CNN, an innovative Ames mutagenicity prediction model based on Convolutional Neural Networks (CNNs), uniquely employing molecular structures as images to leverage CNNs' powerful feature extraction capabilities. The study employs the widely used benchmark mutagenicity dataset from Hansen et al. for model development and evaluation. Comparative analyses with traditional ML models on different molecular features reveal substantial performance enhancements. AMPred-CNN outshines these models, demonstrating superior accuracy, AUC, F1 score, MCC, sensitivity, and specificity on the test set. Notably, AMPred-CNN is further benchmarked against seven recent ML and DL models, consistently showcasing superior performance with an impressive AUC of 0.954. Our study highlights the effectiveness of CNNs in advancing mutagenicity prediction, paving the way for broader applications in toxicology and drug development.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea; Faculty of Information Technology, An Giang University, Long Xuyen 880000, Viet Nam; Vietnam National University-Ho Chi Minh City, Ho Chi Minh 700000, Viet Nam.
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea; Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea.
| |
Collapse
|
7
|
Zhang H, Fan H, Wang J, Hou T, Saravanan KM, Xia W, Kan HW, Li J, Zhang JZH, Liang X, Chen Y. Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery. Brief Bioinform 2024; 25:bbae281. [PMID: 38864340 PMCID: PMC11167311 DOI: 10.1093/bib/bbae281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/05/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
G-protein coupled receptors (GPCRs), crucial in various diseases, are targeted of over 40% of approved drugs. However, the reliable acquisition of experimental GPCRs structures is hindered by their lipid-embedded conformations. Traditional protein-ligand interaction models falter in GPCR-drug interactions, caused by limited and low-quality structures. Generalized models, trained on soluble protein-ligand pairs, are also inadequate. To address these issues, we developed two models, DeepGPCR_BC for binary classification and DeepGPCR_RG for affinity prediction. These models use non-structural GPCR-ligand interaction data, leveraging graph convolutional networks and mol2vec techniques to represent binding pockets and ligands as graphs. This approach significantly speeds up predictions while preserving critical physical-chemical and spatial information. In independent tests, DeepGPCR_BC surpassed Autodock Vina and Schrödinger Dock with an area under the curve of 0.72, accuracy of 0.68 and true positive rate of 0.73, whereas DeepGPCR_RG demonstrated a Pearson correlation of 0.39 and root mean squared error of 1.34. We applied these models to screen drug candidates for GPR35 (Q9HC97), yielding promising results with three (F545-1970, K297-0698, S948-0241) out of eight candidates. Furthermore, we also successfully obtained six active inhibitors for GLP-1R. Our GPCR-specific models pave the way for efficient and accurate large-scale virtual screening, potentially revolutionizing drug discovery in the GPCR field.
Collapse
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hongjie Fan
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
| | - Jixia Wang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Tao Hou
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Agharam Road 173, Selaiyur, Chennai, Tamil Nadu 600073, India
| | - Wei Xia
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hei Wun Kan
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - John Z H Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Xinmiao Liang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Yang Chen
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| |
Collapse
|
8
|
Kengkanna A, Ohue M. Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX. Commun Chem 2024; 7:74. [PMID: 38580841 PMCID: PMC10997661 DOI: 10.1038/s42004-024-01155-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/18/2024] [Indexed: 04/07/2024] Open
Abstract
Graph Neural Networks (GNNs) excel in compound property and activity prediction, but the choice of molecular graph representations significantly influences model learning and interpretation. While atom-level molecular graphs resemble natural topology, they overlook key substructures or functional groups and their interpretation partially aligns with chemical intuition. Recent research suggests alternative representations using reduced molecular graphs to integrate higher-level chemical information and leverages both representations for model. However, there is a lack of studies about applicability and impact of different molecular graphs on model learning and interpretation. Here, we introduce MMGX (Multiple Molecular Graph eXplainable discovery), investigating the effects of multiple molecular graphs, including Atom, Pharmacophore, JunctionTree, and FunctionalGroup, on model learning and interpretation with various perspectives. Our findings indicate that multiple graphs relatively improve model performance, but in varying degrees depending on datasets. Interpretation from multiple graphs in different views provides more comprehensive features and potential substructures consistent with background knowledge. These results help to understand model decisions and offer valuable insights for subsequent tasks. The concept of multiple molecular graph representations and diverse interpretation perspectives has broad applicability across tasks, architectures, and explanation techniques, enhancing model learning and interpretation for relevant applications in drug discovery.
Collapse
Affiliation(s)
- Apakorn Kengkanna
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan
| | - Masahito Ohue
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Kanagawa, 226-8501, Japan.
| |
Collapse
|
9
|
Annotating cell types in single-cell ATAC data via the guidance of the underlying DNA sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:261-262. [PMID: 38671305 DOI: 10.1038/s43588-024-00626-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
|
10
|
Zheng L, Shi F, Peng C, Xu M, Fan F, Li Y, Zhang L, Du J, Wang Z, Lin Z, Sun Y, Deng C, Duan X, Wei L, Zhao C, Fang L, Zhang P, Ma S, Lai L, Yang M. Application scenario-oriented molecule generation platform developed for drug discovery. Methods 2024; 222:112-121. [PMID: 38215898 DOI: 10.1016/j.ymeth.2023.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/22/2023] [Accepted: 12/23/2023] [Indexed: 01/14/2024] Open
Abstract
Design of molecules for candidate compound selection is one of the central challenges in drug discovery due to the complexity of chemical space and requirement of multi-parameter optimization. Here we present an application scenario-oriented platform (ID4Idea) for molecule generation in different scenarios of drug discovery. This platform utilizes both library or rule based and generative based algorithms (VAE, RNN, GAN, etc.), in combination with various AI learning types (pre-training, transfer learning, reinforcement learning, active learning, etc.) and input representations (1D SMILES, 2D graph, 3D shape, binding site, pharmacophore, etc.), to enable customized solutions for a given molecular design scenario. Besides the usual generation followed screening protocol, goal-directed molecule generation can also be conducted towards predefined goals, enhancing the efficiency of hit identification, lead finding, and lead optimization. We demonstrate the effectiveness of ID4Idea platform through case studies, showcasing customized solutions for different design tasks using various input information, such as binding pockets, pharmacophores, and compound representations. In addition, remaining challenges are discussed to unlock the full potential of AI models in drug discovery and pave the way for the development of novel therapeutics.
Collapse
Affiliation(s)
- Lianjun Zheng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangjun Shi
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chunwang Peng
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Min Xu
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Fangda Fan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Yuanpeng Li
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Zhang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Jiewen Du
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zonghu Wang
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Zhixiong Lin
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Yina Sun
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Chenglong Deng
- Jingtai Zhiyao Technology (Shanghai) Co., Ltd. (XtalPi), No. 207 Huanqiao Road, Pudong New Area, Shanghai 201315, China
| | - Xinli Duan
- XtalPi Innovation Center, XtalPi Inc., Beijing, China
| | - Lin Wei
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | | | - Lei Fang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Peiyu Zhang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China
| | - Songling Ma
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Lipeng Lai
- XtalPi Innovation Center, XtalPi Inc., Beijing, China.
| | - Mingjun Yang
- Shenzhen Jingtai Technology Co., Ltd. (XtalPi), Floor 3, Sf Industrial Plant, No. 2 Hongliu Road, Fubao Community, Fubao Street, Futian District, Shenzhen 518045, China.
| |
Collapse
|
11
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
12
|
Che L, Jin Y, Shi Y, Yu X, Sun H, Liu H, Li X. A drug molecular classification model based on graph structure generation. J Biomed Inform 2023; 145:104447. [PMID: 37481052 DOI: 10.1016/j.jbi.2023.104447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 07/14/2023] [Accepted: 07/16/2023] [Indexed: 07/24/2023]
Abstract
Molecular property prediction based on artificial intelligence technology has significant prospects in speeding up drug discovery and reducing drug discovery costs. Among them, molecular property prediction based on graph neural networks (GNNs) has received extensive attention in recent years. However, the existing graph neural networks still face the following challenges in node representation learning. First, the number of nodes increases exponentially with the expansion of the perception field, which limits the exploration ability of the model in the depth direction. Secondly, the large number of nodes in the perception field brings noise, which is not conducive to the model's representation learning of the key structures. Therefore, a graph neural network model based on structure generation is proposed in this paper. The model adopts the depth-first strategy to generate the key structures of the graph, to solve the problem of insufficient exploration ability of the graph neural network in the depth direction. A tendentious node selection method is designed to gradually select nodes and edges to generate the key structures of the graph, to solve the noise problem caused by the excessive number of nodes. In addition, the model skillfully realizes forward propagation and iterative optimization of structure generation by using an attention mechanism and random bias. Experimental results on public data sets show that the proposed model achieves better classification results than the existing best models.
Collapse
Affiliation(s)
- Lixuan Che
- College of Culture and Creativity, Weifang Vocational College, Weifang, China.
| | - Yide Jin
- Department of Statistics, University of Minnesota, Minneapolis, MN, USA.
| | - Yuliang Shi
- School of Software, Shandong University, Jinan, China; Dareway Software Co., Ltd, Jinan, China.
| | - Xiaojing Yu
- Department of Dermatology, Qilu Hospital, Shandong University, Jinan, China.
| | - Hongfeng Sun
- School of Data and Computer Science, Shandong Women's University, Jinan, China.
| | - Hui Liu
- School of Data and Computer Science, Shandong Women's University, Jinan, China.
| | - Xinyu Li
- Department of Dermatology, Qilu Hospital, Shandong University, Jinan, China.
| |
Collapse
|
13
|
Xiang Y, Tang YH, Lin G, Reker D. Interpretable Molecular Property Predictions Using Marginalized Graph Kernels. J Chem Inf Model 2023; 63:4633-4640. [PMID: 37504964 DOI: 10.1021/acs.jcim.3c00396] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Marginalized graph kernels have shown competitive performance in molecular machine learning tasks but currently lack measures of interpretability, which are important to improve trust in the models, detect biases, and inform molecular optimization campaigns. We here conceive and implement two interpretability measures for Gaussian process regression using a marginalized graph kernel (GPR-MGK) to quantify (1) the contribution of specific training data to the prediction and (2) the contribution of specific nodes of the graph to the prediction. We demonstrate the applicability of these interpretability measures for molecular property prediction. We compare GPR-MGK to graph neural networks on four logic and two real-world toxicology data sets and find that the atomic attribution of GPR-MGK generally outperforms the atomic attribution of graph neural networks. We also perform a detailed molecular attribution analysis using the FreeSolv data set, showing how molecules in the training set influence machine learning predictions and why Morgan fingerprints perform poorly on this data set. This is the first systematic examination of the interpretability of GPR-MGK and thereby is an important step in the further maturation of marginalized graph kernel methods for interpretable molecular predictions.
Collapse
Affiliation(s)
- Yan Xiang
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27705, United States
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Guang Lin
- Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27705, United States
| |
Collapse
|
14
|
Yokogawa D, Suda K. Interpretable Attribution Assignment for Octanol-Water Partition Coefficient. J Phys Chem B 2023; 127:7004-7010. [PMID: 37498912 DOI: 10.1021/acs.jpcb.3c02740] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
With the increasing development of machine learning models, their credibility has become an important issue. In chemistry, attribution assignment is gaining relevance when it comes to designing molecules and debugging models. However, attention has only been paid to which atoms are important in the prediction and not to whether the attribution is reasonable. In this study, we developed a graph neural network model, a highly interpretable attribution model in chemistry, and modified the integrated gradients method. The credibility of our approach was confirmed by predicting the octanol-water partition coefficient (logP) and evaluating the three metrics (accuracy, consistency, and stability) in the attribution assignment.
Collapse
Affiliation(s)
- Daisuke Yokogawa
- Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba Meguro-ku, Tokyo 153-8902, Japan
| | - Kayo Suda
- Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba Meguro-ku, Tokyo 153-8902, Japan
| |
Collapse
|
15
|
Amara K, Rodríguez-Pérez R, Jiménez-Luna J. Explaining compound activity predictions with a substructure-aware loss for graph neural networks. J Cheminform 2023; 15:67. [PMID: 37491407 PMCID: PMC10369817 DOI: 10.1186/s13321-023-00733-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/08/2023] [Indexed: 07/27/2023] Open
Abstract
Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach shows higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.
Collapse
Affiliation(s)
- Kenza Amara
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
- Department of Computer Science, ETH Zurich, Andreasstrasse 5, 8050 Zurich, Switzerland
| | | | - José Jiménez-Luna
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
| |
Collapse
|
16
|
Wu Z, Wang J, Du H, Jiang D, Kang Y, Li D, Pan P, Deng Y, Cao D, Hsieh CY, Hou T. Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking. Nat Commun 2023; 14:2585. [PMID: 37142585 PMCID: PMC10160109 DOI: 10.1038/s41467-023-38192-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 04/12/2023] [Indexed: 05/06/2023] Open
Abstract
Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.
Collapse
Affiliation(s)
- Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
- National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, P.R. China
| | - Hongyan Du
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018, Zhejiang, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, Hunan, P.R. China.
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, P.R. China.
| |
Collapse
|
17
|
Wellawatte G, Gandhi HA, Seshadri A, White AD. A Perspective on Explanations of Molecular Prediction Models. J Chem Theory Comput 2023; 19:2149-2160. [PMID: 36972469 PMCID: PMC10134429 DOI: 10.1021/acs.jctc.2c01235] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Indexed: 03/29/2023]
Abstract
Chemists can be skeptical in using deep learning (DL) in decision making, due to the lack of interpretability in "black-box" models. Explainable artificial intelligence (XAI) is a branch of artificial intelligence (AI) which addresses this drawback by providing tools to interpret DL models and their predictions. We review the principles of XAI in the domain of chemistry and emerging methods for creating and evaluating explanations. Then, we focus on methods developed by our group and their applications in predicting solubility, blood-brain barrier permeability, and the scent of molecules. We show that XAI methods like chemical counterfactuals and descriptor explanations can explain DL predictions while giving insight into structure-property relationships. Finally, we discuss how a two-step process of developing a black-box model and explaining predictions can uncover structure-property relationships.
Collapse
Affiliation(s)
- Geemi
P. Wellawatte
- Department
of Chemistry, University of Rochester, Rochester, New York 14627, United States
| | - Heta A. Gandhi
- Department
of Chemical Engineering, University of Rochester, Rochester, New York 14627, United States
| | - Aditi Seshadri
- Department
of Chemical Engineering, University of Rochester, Rochester, New York 14627, United States
| | - Andrew D. White
- Department
of Chemical Engineering, University of Rochester, Rochester, New York 14627, United States
| |
Collapse
|
18
|
Zeng Y, Yin R, Luo M, Chen J, Pan Z, Lu Y, Yu W, Yang Y. Identifying spatial domain by adapting transcriptomics with histology through contrastive learning. Brief Bioinform 2023; 24:7035112. [PMID: 36781228 DOI: 10.1093/bib/bbad048] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 12/26/2022] [Accepted: 01/23/2023] [Indexed: 02/15/2023] Open
Abstract
Recent advances in spatial transcriptomics have enabled measurements of gene expression at cell/spot resolution meanwhile retaining both the spatial information and the histology images of the tissues. Accurately identifying the spatial domains of spots is a vital step for various downstream tasks in spatial transcriptomics analysis. To remove noises in gene expression, several methods have been developed to combine histopathological images for data analysis of spatial transcriptomics. However, these methods either use the image only for the spatial relations for spots, or individually learn the embeddings of the gene expression and image without fully coupling the information. Here, we propose a novel method ConGI to accurately exploit spatial domains by adapting gene expression with histopathological images through contrastive learning. Specifically, we designed three contrastive loss functions within and between two modalities (the gene expression and image data) to learn the common representations. The learned representations are then used to cluster the spatial domains on both tumor and normal spatial transcriptomics datasets. ConGI was shown to outperform existing methods for the spatial domain identification. In addition, the learned representations have also been shown powerful for various downstream tasks, including trajectory inference, clustering, and visualization.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Rui Yin
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou 510000, China
| | - Mai Luo
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jianing Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zixiang Pan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Weijiang Yu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- Department of Computer Science, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
19
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|
20
|
Yang CI, Li YP. Explainable uncertainty quantifications for deep learning-based molecular property prediction. J Cheminform 2023; 15:13. [PMID: 36737786 PMCID: PMC9898940 DOI: 10.1186/s13321-023-00682-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 01/15/2023] [Indexed: 02/05/2023] Open
Abstract
Quantifying uncertainty in machine learning is important in new research areas with scarce high-quality data. In this work, we develop an explainable uncertainty quantification method for deep learning-based molecular property prediction. This method can capture aleatoric and epistemic uncertainties separately and attribute the uncertainties to atoms present in the molecule. The atom-based uncertainty method provides an extra layer of chemical insight to the estimated uncertainties, i.e., one can analyze individual atomic uncertainty values to diagnose the chemical component that introduces uncertainty to the prediction. Our experiments suggest that atomic uncertainty can detect unseen chemical structures and identify chemical species whose data are potentially associated with significant noise. Furthermore, we propose a post-hoc calibration method to refine the uncertainty quantified by ensemble models for better confidence interval estimates. This work improves uncertainty calibration and provides a framework for assessing whether and why a prediction should be considered unreliable.
Collapse
Affiliation(s)
- Chu-I Yang
- grid.19188.390000 0004 0546 0241Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617 Taiwan
| | - Yi-Pei Li
- grid.19188.390000 0004 0546 0241Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617 Taiwan ,grid.28665.3f0000 0001 2287 1366Taiwan International Graduate Program (TIGP), Academia Sinica, No. 128, Sec. 2, Academia Road, Taipei, 11529 Taiwan
| |
Collapse
|
21
|
Rao J, Zheng S, Yang Y. Integrating supercomputing and artificial intelligence for life science. PATTERNS (NEW YORK, N.Y.) 2022; 3:100653. [PMID: 36569549 PMCID: PMC9768675 DOI: 10.1016/j.patter.2022.100653] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Jiahua Rao and Shuangjia Zheng are Ph.D. students in Prof. Yang's lab (Supercomputing And AI for Life science, SAIL Lab) at Sun Yat-sen University. They recently developed an interpretable framework to quantitatively assess the interpretability of Graph Neural Network (GNN) and made comparison with medicinal chemists. Their meaningful benchmarking and rigorous framework would greatly benefit development of new interpretable methods in GNNs.
Collapse
Affiliation(s)
- Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- Galixir Technologies Ltdr, Beijing 100000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, Guangzhou 510000, China
| |
Collapse
|
22
|
Rahman A, Hossain MS, Muhammad G, Kundu D, Debnath T, Rahman M, Khan MSI, Tiwari P, Band SS. Federated learning-based AI approaches in smart healthcare: concepts, taxonomies, challenges and open issues. CLUSTER COMPUTING 2022; 26:1-41. [PMID: 35996680 PMCID: PMC9385101 DOI: 10.1007/s10586-022-03658-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 05/10/2022] [Accepted: 06/17/2022] [Indexed: 06/15/2023]
Abstract
Federated Learning (FL), Artificial Intelligence (AI), and Explainable Artificial Intelligence (XAI) are the most trending and exciting technology in the intelligent healthcare field. Traditionally, the healthcare system works based on centralized agents sharing their raw data. Therefore, huge vulnerabilities and challenges are still existing in this system. However, integrating with AI, the system would be multiple agent collaborators who are capable of communicating with their desired host efficiently. Again, FL is another interesting feature, which works decentralized manner; it maintains the communication based on a model in the preferred system without transferring the raw data. The combination of FL, AI, and XAI techniques can be capable of minimizing several limitations and challenges in the healthcare system. This paper presents a complete analysis of FL using AI for smart healthcare applications. Initially, we discuss contemporary concepts of emerging technologies such as FL, AI, XAI, and the healthcare system. We integrate and classify the FL-AI with healthcare technologies in different domains. Further, we address the existing problems, including security, privacy, stability, and reliability in the healthcare field. In addition, we guide the readers to solving strategies of healthcare using FL and AI. Finally, we address extensive research areas as well as future potential prospects regarding FL-based AI research in the healthcare management system.
Collapse
Affiliation(s)
- Anichur Rahman
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Md. Sazzad Hossain
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Ghulam Muhammad
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Dipanjali Kundu
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
| | - Tanoy Debnath
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Muaz Rahman
- Present Address: Department of Computer Science and Engineering, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka Bangladesh
| | - Md. Saikat Islam Khan
- Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Prayag Tiwari
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Shahab S. Band
- Future Technology Research Center, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin, 64002 Taiwan
| |
Collapse
|