1
|
Li S, Zhang M, Sun P. Prediction of acute toxicity of organic contaminants to fish: Model development and a novel approach to identify reactive substructures. JOURNAL OF HAZARDOUS MATERIALS 2025; 491:137917. [PMID: 40086249 DOI: 10.1016/j.jhazmat.2025.137917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Revised: 03/06/2025] [Accepted: 03/10/2025] [Indexed: 03/16/2025]
Abstract
In this study, count-based Morgan fingerprints (CMF) were employed to represent the fundamental chemical structures of contaminants, and a neural network model (R² = 0.76) was developed to predict acute fish toxicity (AFT) of organic compounds. Models based on CMF consistently outperformed those based on binary Morgan fingerprints (BMF), likely due to the latter's inefficiency in describing homologous structures. The similarity of CMF was calculated using an improved method based on Tanimoto distance, which was used for calculation of dataset partition and application domain. The similarity-based dataset partitioning method ensured structural diversity within the training set and improved performance on the validation set, demonstrating its potential for toxicological structure analysis and priority screening. Toxic substructures identified by Shapley additive explanation (SHAP) method were substituted benzenes, long carbon chains, unsaturated carbons and halogen atoms. By incorporating Kow and monitoring shifts in feature importance, the influence of substructures on AFT was further delineated, revealing their roles in facilitating exposure (e.g.: long carbon chains) and reactive toxicity (e.g.: methyl). Additionally, we compared the toxicity of similar substructures and the same substructure in different chemical environments as well. To address SHAP's insensitivity to low-variance features, this study introduced a novel metric termed the toxicity index (TI), designed to pinpoint substructures that are present in minimal quantities yet potentially exhibit high toxicity. With TI, we identified several important substructures, such as parathion and polycyclic substituents. Finally, prevalent toxic substructures and potential highly toxic substances were identified in two external datasets.
Collapse
Affiliation(s)
- Shangyu Li
- School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China
| | - Mingming Zhang
- Heibei Key Laboratory of Metabolic Diseases, Heibei, China.
| | - Peizhe Sun
- School of Environmental Science and Engineering, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
2
|
Yue J, Pang H, Wei R, Hu C, Qu J. Machine Learning-Assisted Molecular Structure Embedding for Accurate Prediction of Emerging Contaminant Removal by Ozonation Oxidation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:9298-9311. [PMID: 40311064 DOI: 10.1021/acs.est.4c14193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Ozone has demonstrated high efficacy in depredating emerging contaminants (ECs) during drinking water treatment. However, traditional quantitative structure-activation relationship (QSAR) models often fall short in effectively normalizing and characterizing diverse molecular structures, thereby limiting their predictive accuracy for the removal of various ECs. This study uses embedded molecular structure vectors generated by a graph neural network (GNN), combined with functional group prompts, as inputs to a feedforward neural network. A data set of 28 ECs and 542 data points, representing diverse molecular structures and physiochemical properties, was built to predict the residual rate of ECs (REC) in ozonation oxidation. Compared to traditional QSAR models, the GNN-based molecular structure embedded methods significantly improve prediction accuracy. The resulting KANO-EC model achieved an R2 of 0.97 for REC, demonstrating its ability to capture complex structural features. Moreover, KANO-EC maintains exceptional interpretability, elucidating key functional groups (e.g., carbonyls, hydroxyls, aromatic rings, and amines) involved in the oxidation mechanism. This study presents the KANO-EC model as a novel approach for predicting the ozonation removal efficiency of current and potential ECs. The model also provides valuable insights for developing efficient control strategies for ensuring the long-term safety and sustainability of drinking water supplies.
Collapse
Affiliation(s)
- Jiapeng Yue
- State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hongjiao Pang
- State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Renke Wei
- State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chengzhi Hu
- State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiuhui Qu
- State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
3
|
Pal S, Hanson QM, Ogden SC, Lee EM, Martinez NJ, Zakharov AV. Discovery of SARS-CoV-2 Nsp14-Methyltransferase (MTase) Inhibitors by Harnessing Scaffold-Centric Exploration of the Ultra Large Chemical Space. ACS Pharmacol Transl Sci 2025; 8:1366-1400. [PMID: 40370981 PMCID: PMC12070326 DOI: 10.1021/acsptsci.5c00111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 04/09/2025] [Accepted: 04/15/2025] [Indexed: 05/16/2025]
Abstract
The global impact of SARS-CoV-2 underscores the need for antiviral treatments beyond vaccines. This study targets Nsp14-MTase, a viral protein essential for replication. Initial quantitative high-throughput screening (qHTS) of ∼15,000 compounds from the selected NCATS in-house libraries identified 135 active hit molecules, reflecting a hit-rate of 1.04%. To enhance the search for promising antiviral agents, we expanded this screening campaign with two rounds of machine learning (ML)-based virtual screening of ∼130,000 compounds. The first iteration yielded 72 active compounds encompassing 27 chemotypes with an IC50 ranging from 1.45 μM to 33.27 μM, increasing the hit-rate 28-fold over the initial qHTS screen. Scaffold clustering of those hits revealed 27 chemotypes. The second iteration added 30 more hits (IC50: 2.18 μM-30.79 μM) across 12 new chemotypes. Initial structure-activity relationship (SAR) exploration around selected chemotypes identified NCGC00606183 (IC50: 0.41 μM) as the most potent hit. Hit-to-lead optimization using scaffold-centric exploration against the ultra large Enamine REAL Space (∼5.6 billion compounds) in HPC clusters identified 78 analogs, with 56 showing potent biochemical activity (IC50: 0.12 μM-18.23 μM) and cellular activity (0.27 μM-23.07 μM) in fully infectious SARS-CoV-2 live virus assays.
Collapse
Affiliation(s)
- Sourav Pal
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences
(NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States
| | - Quinlin M. Hanson
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences
(NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States
| | - Sarah C. Ogden
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences
(NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States
| | - Emily M. Lee
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences
(NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States
| | - Natalia J. Martinez
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences
(NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States
| | - Alexey V. Zakharov
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences
(NCATS), National Institutes of Health (NIH), Rockville, Maryland 20850, United States
| |
Collapse
|
4
|
Kim S, Han M, Park J, Lee K, Park S. Machine Learning Prediction of Optical Properties of Coumarin Derivatives Using Gaussian-Weighted Graph Convolution and Subgraph Modular Input. J Chem Inf Model 2025. [PMID: 40334113 DOI: 10.1021/acs.jcim.5c00619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2025]
Abstract
Coumarin derivatives have been widely developed and utilized as chromophores and fluorophores in various research fields. In this study, we constructed an experimental database of the optical properties─specifically, absorption and emission wavelengths measured in solutions─and developed a machine learning (ML) model based on Gaussian-weighted graph convolution (GWGC) and subgraph modular input (SMI) to predict these properties. The GWGC was introduced as a novel molecular representation that accounts for interatomic effects among neighboring atoms when the optical properties of coumarin derivatives were predicted. The SMI was introduced to represent coumarin derivatives as subgraphs composed of a coumarin core and six substituents, thereby modularizing the molecular vector into a core vector and substituent vectors. This approach encodes both the separate chemical information on the core and substituents as well as the positional information on the substituents, facilitating an understanding of how each substituent influences the optical properties of the coumarin core. ML models leveraging GWGC and SMI outperformed those based on RDKit descriptors and count-based Morgan fingerprint. The ML models with GWGC and SMI can be generally applied to predict properties of molecules composed of a core structure and its various substituents.
Collapse
Affiliation(s)
- Seokwoo Kim
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Minhi Han
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Jinyong Park
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Kiwoong Lee
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| | - Sungnam Park
- Department of Chemistry and Research Institute for Natural Science, Korea University, Seoul 02841, Korea
| |
Collapse
|
5
|
Rasmussen MH, Strandgaard M, Seumer J, Hemmingsen LK, Frei A, Balcells D, Jensen JH. SMILES all around: structure to SMILES conversion for transition metal complexes. J Cheminform 2025; 17:63. [PMID: 40296090 PMCID: PMC12039060 DOI: 10.1186/s13321-025-01008-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Accepted: 03/31/2025] [Indexed: 04/30/2025] Open
Abstract
We present a method for creating RDKit-parsable SMILES for transition metal complexes (TMCs) based on xyz-coordinates and overall charge of the complex. This can be viewed as an extension to the program xyz2mol that does the same for organic molecules. The only dependency is RDKit, which makes it widely applicable. One thing that has been lacking when it comes to generating SMILES from structure for TMCs is an existing SMILES dataset to compare with. Therefore, sanity-checking a method has required manual work. Therefore, we also generate SMILES two other ways; one where ligand charges and TMC connectivity are based on natural bond orbital (NBO) analysis from density functional theory (DFT) calculations utilizing recent work by Kneiding et al. (Digit Discov 2: 618-633, 2023). Another one fixes SMILES available through the Cambridge Structural Database (CSD), making them parsable by RDKit. We compare these three different ways of obtaining SMILES for a subset of the CSD (tmQMg) and find >70% agreement for all three pairs. We utilize these SMILES to make simple molecular fingerprint (FP) and graph-based representations of the molecules to be used in the context of machine learning. Comparing with the graphs made by Kneiding et al. where nodes and edges are featurized with DFT properties, we find that depending on the target property (polarizability, HOMO-LUMO gap or dipole moment) the SMILES based representations can perform equally well. This makes them very suitable as baseline-models. Finally we present a dataset of 227k RDKit parsable SMILES for mononuclear TMCs in the CSD.Scientific contribution We present a method that can create RDKit-parsable SMILES strings of transition metal complexes (TMCs) from Cartesian coordinates and use it to create a dataset of 227k TMC SMILES strings. The RDKit-parsability allows us to generate perform machine learning studies of TMC properties using "standard" molecular representations such as fingerprints and 2D-graph convolution. We show that these relatively simple representations can perform quite well depending on the target property.
Collapse
Affiliation(s)
- Maria H Rasmussen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| | | | - Julius Seumer
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| | | | - Angelo Frei
- Department of Chemistry, University of York, York, UK
| | - David Balcells
- Department of Chemistry, University of Oslo, Oslo, Norway
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
6
|
Pal S, Nance KD, Joshi DR, Kales SC, Ye L, Hu X, Shamim K, Zakharov AV. Applications of Machine Learning Approaches for the Discovery of SARS-CoV-2 PLpro Inhibitors. J Chem Inf Model 2025; 65:1338-1356. [PMID: 39818814 DOI: 10.1021/acs.jcim.4c02126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The global impact of SARS-CoV-2 highlights the need for treatments beyond vaccination, given the limited availability of effective medications. While Pfizer introduced Paxlovid, an FDA-approved antiviral targeting the SARS-CoV-2 main protease (Mpro), this study focuses on designing new antivirals against another protease, papain-like protease (PLpro), which is crucial for viral replication and immune suppression. NCATS/NIH performed a high-throughput screen of ∼15,000 molecules from an internal molecular library, identifying initial hits with a 0.5% success rate. To improve the hit rate and identify potent inhibitors, machine learning-based virtual screens were applied to ∼150,000 compounds, yielding 125 top predicted hits. Biochemical evaluation revealed 25 promising compounds, with a 20% hit-rate and IC50 values from 1.75 μM to <36 μM across 13 chemotypes. Further analog screening of those chemotypes, as part of the structure-activity relationships, led to 20 additional hits. Additionally, the hit-to-lead optimization of chemotype 7 produced 10 more analogs. These PLpro inhibitors provide promising templates for antiviral development against COVID-19.
Collapse
Affiliation(s)
- Sourav Pal
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Kellie D Nance
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Dirgha Raj Joshi
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Stephen C Kales
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Lin Ye
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Xin Hu
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Khalida Shamim
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Rockville, Maryland 20850, United States
| |
Collapse
|
7
|
Chang HC, Tsai MH, Li YP. Enhancing Activation Energy Predictions under Data Constraints Using Graph Neural Networks. J Chem Inf Model 2025; 65:1367-1377. [PMID: 39862160 PMCID: PMC11815826 DOI: 10.1021/acs.jcim.4c02319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/14/2025] [Accepted: 01/14/2025] [Indexed: 01/27/2025]
Abstract
Accurately predicting activation energies is crucial for understanding chemical reactions and modeling complex reaction systems. However, the high computational cost of quantum chemistry methods often limits the feasibility of large-scale studies, leading to a scarcity of high-quality activation energy data. In this work, we explore and compare three innovative approaches (transfer learning, delta learning, and feature engineering) to enhance the accuracy of activation energy predictions using graph neural networks, specifically focusing on methods that incorporate low-cost, low-level computational data. Using the Chemprop model, we systematically evaluated how these methods leverage data from semiempirical quantum mechanics (SQM) calculations to improve predictions. Delta learning, which adjusts low-level SQM activation energies to align with high-level CCSD(T)-F12a targets, emerged as the most effective method, achieving high accuracy with substantially reduced data requirements. Notably, delta learning trained with just 20-30% of high-level data matched or exceeded the performance of other methods trained with full data sets, making it advantageous in data-scarce scenarios. However, its reliance on transition state searches imposes significant computational demands during model application. Transfer learning, which pretrains models on large data sets of low-level data, provided mixed results, particularly when there was a mismatch in the reaction distributions between the training and target data sets. Feature engineering, which involves adding computed molecular properties as input features, showed modest gains, particularly in thermodynamic properties. Our study highlights the trade-offs between accuracy and computational demand in selecting the best approach for enhancing activation energy predictions. These insights provide valuable guidelines for researchers aiming to apply machine learning in chemical reaction engineering, helping to balance accuracy with resource constraints.
Collapse
Affiliation(s)
- Han-Chung Chang
- Department
of Chemical Engineering, National Taiwan
University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Ming-Hsuan Tsai
- Department
of Chemical Engineering, National Taiwan
University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
| | - Yi-Pei Li
- Department
of Chemical Engineering, National Taiwan
University, No. 1, Section 4, Roosevelt Road, Taipei 10617, Taiwan
- Taiwan
International Graduate Program on Sustainable Chemical Science and
Technology (TIGP-SCST), No. 128, Section 2, Academia Road, Taipei 11529, Taiwan
| |
Collapse
|
8
|
Lin Y, Yang X, Zhang M, Cheng J, Lin H, Zhao Q. CLSSATP: Contrastive learning and self-supervised learning model for aquatic toxicity prediction. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2025; 279:107244. [PMID: 39805255 DOI: 10.1016/j.aquatox.2025.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 12/17/2024] [Accepted: 01/07/2025] [Indexed: 01/16/2025]
Abstract
As compound concentrations in aquatic environments increase, the habitat degradation of aquatic organisms underscores the growing importance of studying the impact of chemicals on diverse aquatic populations. Understanding the potential impacts of different chemical substances on different species is a necessary requirement for protecting the environment and ensuring sustainable human development. In this regard, deep learning methods offer significant advantages over traditional experimental approaches in terms of cost, accuracy, and generalization ability. This research introduces CLSSATP, an efficient contrastive self-supervised learning deep neural network prediction model for organic toxicity. The model integrates two modules, a self-supervised learning module using molecular fingerprints for representation, and a contrastive learning module utilizing molecular graphs. Through dual-perspective learning, the model gains clear insights into the structural and property relationships of molecules. The experiment results indicate that our model outperforms comparative methods, demonstrating the effectiveness of our proposed architecture. Moreover, ablation experiments show that the self-supervised module and contrastive learning module respectively provide average performance improvements of 9.43 % and 10.98 % to CLSSATP. Furthermore, by visualizing the representations of our model, we observe that it correctly identifies the substructures that determine the molecular properties, granting itself with interpretability. In conclusion, CLSSATP offers a novel and effective perspective for future research in aquatic toxicity assessment. All of codes and datasets are freely available online at https://github.com/zhaoqi106/CLSSATP.
Collapse
Affiliation(s)
- Ye Lin
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Mingxuan Zhang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China; School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jinyan Cheng
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Hai Lin
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
| |
Collapse
|
9
|
Yang X, Sun J, Jin B, Lu Y, Cheng J, Jiang J, Zhao Q, Shuai J. Multi-task aquatic toxicity prediction model based on multi-level features fusion. J Adv Res 2025; 68:477-489. [PMID: 38844122 PMCID: PMC11785906 DOI: 10.1016/j.jare.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/21/2024] [Accepted: 06/02/2024] [Indexed: 06/09/2024] Open
Abstract
INTRODUCTION With the escalating menace of organic compounds in environmental pollution imperiling the survival of aquatic organisms, the investigation of organic compound toxicity across diverse aquatic species assumes paramount significance for environmental protection. Understanding how different species respond to these compounds helps assess the potential ecological impact of pollution on aquatic ecosystems as a whole. Compared with traditional experimental methods, deep learning methods have higher accuracy in predicting aquatic toxicity, faster data processing speed and better generalization ability. OBJECTIVES This article presents ATFPGT-multi, an advanced multi-task deep neural network prediction model for organic toxicity. METHODS The model integrates molecular fingerprints and molecule graphs to characterize molecules, enabling the simultaneous prediction of acute toxicity for the same organic compound across four distinct fish species. Furthermore, to validate the advantages of multi-task learning, we independently construct prediction models, named ATFPGT-single, for each fish species. We employ cross-validation in our experiments to assess the performance and generalization ability of ATFPGT-multi. RESULTS The experimental results indicate, first, that ATFPGT-multi outperforms ATFPGT-single on four fish datasets with AUC improvements of 9.8%, 4%, 4.8%, and 8.2%, respectively, demonstrating the superiority of multi-task learning over single-task learning. Furthermore, in comparison with previous algorithms, ATFPGT-multi outperforms comparative methods, emphasizing that our approach exhibits higher accuracy and reliability in predicting aquatic toxicity. Moreover, ATFPGT-multi utilizes attention scores to identify molecular fragments associated with fish toxicity in organic molecules, as demonstrated by two organic molecule examples in the main text, demonstrating the interpretability of ATFPGT-multi. CONCLUSION In summary, ATFPGT-multi provides important support and reference for the further development of aquatic toxicity assessment. All of codes and datasets are freely available online at https://github.com/zhaoqi106/ATFPGT-multi.
Collapse
Affiliation(s)
- Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Bingyu Jin
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Yuer Lu
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jinyan Cheng
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jiaju Jiang
- College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| | - Jianwei Shuai
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China; Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China.
| |
Collapse
|
10
|
Li C, Li G. DynHeter-DTA: Dynamic Heterogeneous Graph Representation for Drug-Target Binding Affinity Prediction. Int J Mol Sci 2025; 26:1223. [PMID: 39940990 PMCID: PMC11818550 DOI: 10.3390/ijms26031223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/27/2025] [Accepted: 01/28/2025] [Indexed: 02/16/2025] Open
Abstract
In drug development, drug-target affinity (DTA) prediction is a key indicator for assessing the drug's efficacy and safety. Despite significant progress in deep learning-based affinity prediction approaches in recent years, there are still limitations in capturing the complex interactions between drugs and target receptors. To address this issue, a dynamic heterogeneous graph prediction model, DynHeter-DTA, is proposed in this paper, which fully leverages the complex relationships between drug-drug, protein-protein, and drug-protein interactions, allowing the model to adaptively learn the optimal graph structures. Specifically, (1) in the data processing layer, to better utilize the similarities and interactions between drugs and proteins, the model dynamically adjusts the connection strengths between drug-drug, protein-protein, and drug-protein pairs, constructing a variable heterogeneous graph structure, which significantly improves the model's expressive power and generalization performance; (2) in the model design layer, considering that the quantity of protein nodes significantly exceeds that of drug nodes, an approach leveraging Graph Isomorphism Networks (GIN) and Self-Attention Graph Pooling (SAGPooling) is proposed to enhance prediction efficiency and accuracy. Comprehensive experiments on the Davis, KIBA, and Human public datasets demonstrate that DynHeter-DTA exceeds the performance of previous models in drug-target interaction forecasting, providing an innovative solution for drug-target affinity prediction.
Collapse
Affiliation(s)
- Changli Li
- School of Artificial Intelligence, Nanjing University of Information Science & Technology, Nanjing 210044, China;
| | | |
Collapse
|
11
|
Dangayach R, Jeong N, Demirel E, Uzal N, Fung V, Chen Y. Machine Learning-Aided Inverse Design and Discovery of Novel Polymeric Materials for Membrane Separation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:993-1012. [PMID: 39680111 PMCID: PMC11755723 DOI: 10.1021/acs.est.4c08298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/17/2024]
Abstract
Polymeric membranes have been widely used for liquid and gas separation in various industrial applications over the past few decades because of their exceptional versatility and high tunability. Traditional trial-and-error methods for material synthesis are inadequate to meet the growing demands for high-performance membranes. Machine learning (ML) has demonstrated huge potential to accelerate design and discovery of membrane materials. In this review, we cover strengths and weaknesses of the traditional methods, followed by a discussion on the emergence of ML for developing advanced polymeric membranes. We describe methodologies for data collection, data preparation, the commonly used ML models, and the explainable artificial intelligence (XAI) tools implemented in membrane research. Furthermore, we explain the experimental and computational validation steps to verify the results provided by these ML models. Subsequently, we showcase successful case studies of polymeric membranes and emphasize inverse design methodology within a ML-driven structured framework. Finally, we conclude by highlighting the recent progress, challenges, and future research directions to advance ML research for next generation polymeric membranes. With this review, we aim to provide a comprehensive guideline to researchers, scientists, and engineers assisting in the implementation of ML to membrane research and to accelerate the membrane design and material discovery process.
Collapse
Affiliation(s)
- Raghav Dangayach
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Nohyeong Jeong
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Elif Demirel
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Nigmet Uzal
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Department
of Civil Engineering, Abdullah Gul University, 38039 Kayseri, Turkey
| | - Victor Fung
- School
of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Yongsheng Chen
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
12
|
Lin C, Zhang H. Polymer Biodegradation in Aquatic Environments: A Machine Learning Model Informed by Meta-Analysis of Structure-Biodegradation Relationships. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:1253-1263. [PMID: 39772517 PMCID: PMC11755772 DOI: 10.1021/acs.est.4c11282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/21/2024] [Accepted: 12/23/2024] [Indexed: 01/11/2025]
Abstract
Polymers are widely produced and contribute significantly to environmental pollution due to their low recycling rates and persistence in natural environments. Biodegradable polymers, while promising for reducing environmental impact, account for less than 2% of total polymer production. To expand the availability of biodegradable polymers, research has explored structure-biodegradability relationships, yet most studies focus on specific polymers, necessitating further exploration across diverse polymers. This study addresses this gap by curating an extensive aerobic biodegradation data set of 74 polymers and 1779 data points drawn from both published literature and 28 sets of original experiments. We then conducted a meta-analysis to evaluate the effects of experimental conditions, polymer structure, and the combined impact of polymer structure and properties on biodegradation. Next, we developed a machine learning model to predict polymer biodegradation in aquatic environments. The model achieved an Rtest2 score of 0.66 using Morgan fingerprints, detailed experimental conditions, and thermal decomposition temperature (Td) as the input descriptors. The model's robustness was supported by a feature importance analysis, revealing that substructure R-O-R in polyethers and polysaccharides positively influenced biodegradation, while molecular weight, Td, substructure -OC(═O)- in polyesters and polyalkylene carbonates, side chains, and aromatic rings negatively impacted it. Additionally, validation against the meta-analysis findings confirmed that predictions for unseen test sets aligned with established empirical biodegradation knowledge. This study not only expands our understanding across diverse polymers but also offers a valuable tool for designing environmentally friendly polymers.
Collapse
Affiliation(s)
- Chengrui Lin
- Department of Civil and Environmental
Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Huichun Zhang
- Department of Civil and Environmental
Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| |
Collapse
|
13
|
Zhu M, Xiao Z, Zhang T, Lu G. Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish. JOURNAL OF HAZARDOUS MATERIALS 2025; 482:136606. [PMID: 39579709 DOI: 10.1016/j.jhazmat.2024.136606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/14/2024] [Accepted: 11/19/2024] [Indexed: 11/25/2024]
Abstract
Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as ADSAL) methodology. The optimal EL models, together with the ADSAL, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.
Collapse
Affiliation(s)
- Minghua Zhu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China
| | - Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Zhang
- State Key Laboratory of Urban Water Resources and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Guanghua Lu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China.
| |
Collapse
|
14
|
Qin W, Zheng S, Guo K, Yang M, Fang J. Predicting reaction kinetics of reactive bromine species with organic compounds by machine learning: Feature combination and knowledge transfer with reactive chlorine species. JOURNAL OF HAZARDOUS MATERIALS 2024; 480:136410. [PMID: 39509874 DOI: 10.1016/j.jhazmat.2024.136410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 10/19/2024] [Accepted: 11/04/2024] [Indexed: 11/15/2024]
Abstract
Reactive bromine species (RBS) such as bromine atom (Br•) and dibromine radical (Br2•-) are important oxidative species accounting for the transformation of organic compounds in bromide-containing water. This study developed quantitative structure-activity relationship (QSAR) models to predict second order rate constants (k) of RBS by machine learning (ML) and conducted knowledge transfer between RBS and reactive chlorine species (RCS, e.g., Cl• and Cl2•-) to improve model performance. The ML-based models (RMSEtest = 0.476 -0.712) outperformed the multiple linear regression-based models (RMSEtest = 0.572 -3.68) for predicting k of RBS. In addition, the combination of molecular fingerprints (MFs) and quantum descriptors (QDs) as input features improved the performance of ML-based models (RMSEtest = 0.476 -0.712) compared to those developed by MFs (RMSEtest = 0.524 -0.834) or QDs (RMSEtest = 0.572 -0.806) alone. EHOMO and Egap were identified to be the most important features affecting k of RBS based on SHAP analysis. A unified model integrating the datasets of four reactive halogen species (RHS, e.g., Br•, Br2•-, Cl• and Cl2•-) was further developed (R2test = 0.802), which showed better predictive performance than the individual models (R2test = 0.521 -0.776). Meanwhile, the model performance changed differently by employing knowledge transfer among RHS, which was improved for Br•/Cl•, mixed for Br•/Br2•- and Cl•/Cl2•-, but worse for Br2•-/Cl2•-. This study provides useful tools for predicting k of RHS in aqueous environments.
Collapse
Affiliation(s)
- Wenlei Qin
- Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China
| | - Shanshan Zheng
- School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen 518060, China
| | - Kaiheng Guo
- Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China
| | - Ming Yang
- HFI Huafu International, Guangzhou 510641, China
| | - Jingyun Fang
- Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-Sen University, Guangzhou 510275, China.
| |
Collapse
|
15
|
Dablander M, Hanser T, Lambiotte R, Morris GM. Sort & Slice: a simple and superior alternative to hash-based folding for extended-connectivity fingerprints. J Cheminform 2024; 16:135. [PMID: 39627861 PMCID: PMC11616156 DOI: 10.1186/s13321-024-00932-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 11/12/2024] [Indexed: 12/06/2024] Open
Abstract
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods. In contrast, sets of detected ECFP substructures are by default transformed into bit vectors using only a simple hash-based folding procedure. We introduce a general mathematical framework for the vectorisation of structural fingerprints via a formal operation called substructure pooling that encompasses hash-based folding, algorithmic substructure selection, and a wide variety of other potential techniques. We go on to describe Sort & Slice, an easy-to-implement and bit-collision-free alternative to hash-based folding for the pooling of ECFP substructures. Sort & Slice first sorts ECFP substructures according to their relative prevalence in a given set of training compounds and then slices away all but the L most frequent substructures which are subsequently used to generate a binary fingerprint of desired length, L. We computationally compare the performance of hash-based folding, Sort & Slice, and two advanced supervised substructure-selection schemes (filtering and mutual-information maximisation) for ECFP-based molecular property prediction. Our results indicate that, despite its technical simplicity, Sort & Slice robustly (and at times substantially) outperforms traditional hash-based folding as well as the other investigated substructure-pooling methods across distinct prediction tasks, data splitting techniques, machine-learning models and ECFP hyperparameters. We thus recommend that Sort & Slice canonically replace hash-based folding as the default substructure-pooling technique to vectorise ECFPs for supervised molecular machine learning. Scientific contribution A general mathematical framework for the vectorisation of structural fingerprints called substructure pooling; and the technical description and computational evaluation of Sort & Slice, a conceptually simple and bit-collision-free method for the pooling of ECFP substructures that robustly and markedly outperforms classical hash-based folding at molecular property prediction.
Collapse
Affiliation(s)
- Markus Dablander
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter (550), Woodstock Road, Oxford, OX2 6GG, UK
| | - Thierry Hanser
- Lhasa Limited, Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK
| | - Renaud Lambiotte
- Mathematical Institute, University of Oxford, Andrew Wiles Building, Radcliffe Observatory Quarter (550), Woodstock Road, Oxford, OX2 6GG, UK
| | - Garrett M Morris
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK.
| |
Collapse
|
16
|
Liu Z, Lei J, Cheng L, Yang R, Yang Z, Shi B, Wang J, Zhang A, Liu Y. Intelligent optimal control model of selection pressure for rapid culture of aerobic granular sludge based on machine learning and simulated annealing algorithm. BIORESOURCE TECHNOLOGY 2024; 413:131509. [PMID: 39321933 DOI: 10.1016/j.biortech.2024.131509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/30/2024] [Accepted: 09/19/2024] [Indexed: 09/27/2024]
Abstract
Aerobic Granular Sludge (AGS) has advantages over Activated sludge (AS) but faces challenges with long granulation periods. In this study, a novel grey-box model is devised to optimize the cultivation of AGS to shorten the formation time. This model is based on an existing white-box model. The modeling process starts with the application of four sensitivity analysis methods to assess the 12 model metrics selected. Subsequently, 12 prediction models were constructed by combining the six Machine learning (ML) algorithms and integrated algorithms, with the best performance selected (R2 = 0.98). Finally, an AGS selection pressure planning model was designed in conjunction with a simulated annealing (SA) algorithm to guide AGS training. The results demonstrate that AGS formation could be achieved within four days under the model's optimal control. Therefore, the establishment of this model provides a new technique for the cultivation of AGS.
Collapse
Affiliation(s)
- Zhe Liu
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China; Key Lab of Northwest Water Resource, Environment and Ecology, Ministry of Education, Xi'an University of Architecture and Technology, Xi'an 710055, China.
| | - Jie Lei
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China
| | - Linshan Cheng
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China
| | - Rushuo Yang
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China
| | - Zhuangzhuang Yang
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China
| | - Bingrui Shi
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China
| | - JiaXuan Wang
- School of Architecture and Civil Engineering, Xi'an University of Science and Technology, Yan Ta Road, No. 58, Xi'an 710054, China
| | - Aining Zhang
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China
| | - Yongjun Liu
- School of Environmental and Municipal Engineering, Xi'an University of Architecture and Technology, Yan Ta Road. No.13, Xi'an 710055, China; Key Lab of Northwest Water Resource, Environment and Ecology, Ministry of Education, Xi'an University of Architecture and Technology, Xi'an 710055, China
| |
Collapse
|
17
|
Yang K, Cheng J, Cao S, Pan X, Shen HB, Yuan Y. Predicting transcriptional changes induced by molecules with MiTCP. Brief Bioinform 2024; 26:bbaf006. [PMID: 39847444 PMCID: PMC11756340 DOI: 10.1093/bib/bbaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 12/05/2024] [Accepted: 01/21/2025] [Indexed: 01/24/2025] Open
Abstract
Studying the changes in cellular transcriptional profiles induced by small molecules can significantly advance our understanding of cellular state alterations and response mechanisms under chemical perturbations, which plays a crucial role in drug discovery and screening processes. Considering that experimental measurements need substantial time and cost, we developed a deep learning-based method called Molecule-induced Transcriptional Change Predictor (MiTCP) to predict changes in transcriptional profiles (CTPs) of 978 landmark genes induced by molecules. MiTCP utilizes graph neural network-based approaches to simultaneously model molecular structure representation and gene co-expression relationships, and integrates them for CTP prediction. After training on the L1000 dataset, MiTCP achieves an average Pearson correlation coefficient (PCC) of 0.482 on the test set and an average PCC of 0.801 for predicting the top 50 differentially expressed genes, which outperforms other existing methods. Furthermore, we used MiTCP to predict CTPs of three cancer drugs, palbociclib, irinotecan and goserelin, and performed gene enrichment analysis on the top differentially expressed genes and found that the enriched pathways and Gene Ontology terms are highly relevant to the corresponding diseases, which reveals the potential of MiTCP in drug development.
Collapse
Affiliation(s)
- Kaiyuan Yang
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Jiabei Cheng
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Shenghao Cao
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Xiaoyong Pan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
| | - Ye Yuan
- Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China
- State Key Laboratory of Biopharmaceutical Preparation and Delivery, Institute of Process Engineering, Chinese Academy of Sciences, 1 North 2nd Street, Zhongguancun, Haidian District, Beijing 100190, China
| |
Collapse
|
18
|
Zhong S, Chen Y, Li J, Igou T, Xiong A, Guan J, Dai Z, Cai X, Qu X, Chen Y. Screening Environmentally Benign Ionic Liquids for CO 2 Absorption Using Representation Uncertainty-Based Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY LETTERS 2024; 11:1193-1199. [PMID: 39554598 PMCID: PMC11562734 DOI: 10.1021/acs.estlett.4c00524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 09/02/2024] [Accepted: 09/05/2024] [Indexed: 11/19/2024]
Abstract
Screening ionic liquids (ILs) with low viscosity, low toxicity, and high CO2 absorption using machine learning (ML) models is crucial for mitigating global warming. However, when candidate ILs fall into the extrapolation zone of ML models, predictions may become unreliable, leading to poor decision-making. In this study, we introduce a "representation uncertainty" (RU) approach to quantify prediction uncertainty by employing four IL representations: molecular fingerprint, molecular descriptor, molecular image, and molecular graph. We develop four types of ML models based on these representations and calculate RU as the standard deviation of predictions across these models. Compared to traditional model uncertainty (MU), which is based on hyperparameter variations within a single representation, RU outperforms MU in identifying unreliable predictions across four IL property data sets: viscosity, toxicity, refractive index, and CO2 absorption capacity. Furthermore, we develop ensemble models from the four types of models, which show superior predictive performance compared with that of individual models. Using the RU approach, we screened 1420 ILs and identified 37 promising candidates with low viscosity, low toxicity, and high CO2 absorption capacity. The predictive performance of our ensemble model, along with the effectiveness of the RU-based approach, was experimentally validated by testing the CO2 absorption capacity of 14 ILs. This study not only offers a more reliable method for screening and designing ILs, accelerating the discovery process, but also introduces a new perspective on developing ensemble models with enhanced predictive performance.
Collapse
Affiliation(s)
- Shifa Zhong
- Department
of Environmental Science, Institute of Eco-Chongming, School of Ecological
and Environmental Sciences, East China Normal
University, Shanghai 200241, P. R. China
| | - Yushan Chen
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Jibai Li
- Department
of Environmental Science, Institute of Eco-Chongming, School of Ecological
and Environmental Sciences, East China Normal
University, Shanghai 200241, P. R. China
| | - Thomas Igou
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Anyue Xiong
- Fort
Richmond Collegiate, Winnipeg, MB R3T 3B3, Canada
| | - Jian Guan
- Department
of Environmental Science, Institute of Eco-Chongming, School of Ecological
and Environmental Sciences, East China Normal
University, Shanghai 200241, P. R. China
| | - Zhenhua Dai
- Department
of Environmental Science, Institute of Eco-Chongming, School of Ecological
and Environmental Sciences, East China Normal
University, Shanghai 200241, P. R. China
| | - Xuanying Cai
- Department
of Environmental Science, Institute of Eco-Chongming, School of Ecological
and Environmental Sciences, East China Normal
University, Shanghai 200241, P. R. China
| | - Xintong Qu
- Department
of Environmental Science, Institute of Eco-Chongming, School of Ecological
and Environmental Sciences, East China Normal
University, Shanghai 200241, P. R. China
| | - Yongsheng Chen
- School
of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
19
|
Devore DP, Shuford KL. Data and Molecular Fingerprint-Driven Machine Learning Approaches to Halogen Bonding. J Chem Inf Model 2024; 64:8201-8214. [PMID: 39469831 DOI: 10.1021/acs.jcim.4c01427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
The ability to predict the strength of halogen bonds and properties of halogen bond (XB) donors has significant utility for medicinal chemistry and materials science. XBs are typically calculated through expensive ab initio methods. Thus, the development of tools and techniques for fast, accurate, and efficient property predictions has become increasingly more important. Herein, we employ three machine learning models to classify the XB donors and complexes by their principal halogen atom as well as predict the values of the maximum point on the electrostatic potential surface (VS,max) and interaction strength of the XB complexes through a molecular fingerprint and data-based analysis. The fingerprint analysis produces a root-mean-square error of ca. 7.5 and ca. 5.5 kcal mol-1 while predicting the VS,max for the halobenzene and haloethynylbenzene systems, respectively. However, the prediction of the binding energy between the XB donors and ammonia acceptor is shown to be within 1 kcal mol-1 of the density functional theory (DFT)-calculated energy. More accurate predictions can be made from the precalculated DFT data when compared to the fingerprint analysis.
Collapse
Affiliation(s)
- Daniel P Devore
- Department of Chemistry and Biochemistry, Baylor University, One Bear Place #97348, Waco, Texas 76798-7348, United States
| | - Kevin L Shuford
- Department of Chemistry and Biochemistry, Baylor University, One Bear Place #97348, Waco, Texas 76798-7348, United States
| |
Collapse
|
20
|
Zhang J, Fu K, Wang D, Zhou S, Luo J. Refining hydrogel-based sorbent design for efficient toxic metal removal using machine learning-Bayesian optimization. JOURNAL OF HAZARDOUS MATERIALS 2024; 479:135688. [PMID: 39236540 DOI: 10.1016/j.jhazmat.2024.135688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/28/2024] [Accepted: 08/26/2024] [Indexed: 09/07/2024]
Abstract
Hydrogel-based sorbents show promise in the removal of toxic metals from water. However, optimizing their performance through conventional trial-and-error methods is both costly and challenging due to the inherent high-dimensional parameter space associated with complex condition combinations. In this study, machine learning (ML) was employed to uncover the relationship between the fabrication condition of hydrogel sorbent and their efficiency in removing toxic metals. The developed XGBoost models demonstrated exceptional accuracy in predicting hydrogel adsorption coefficients (Kd) based on synthesis materials and fabrication conditions. Key factors such as reaction temperature (50-70 °C), time (5-72 h), initiator ((NH4)2S2O8: 2.3-10.3 mol%), and crosslinker (Methylene-Bis-Acrylamide: 1.5-4.3 mol%) significantly influenced Kd. Subsequently, ten hydrogels were fabricated utilizing these optimized feature combinations based on Bayesian optimization, exhibiting superior toxic metal adsorption capabilities that surpassed existing limits (logKd (Cu): increased from 2.70 to 3.06; logKd (Pb): increased from 2.76 to 3.37). Within these determined combinations, the error range (0.025-0.172) between model predictions and experimental validations for logKd (Pb) indicated negligible disparity. Our research outcomes not only offer valuable insights but also provide practical guidance, highlighting the potential for custom-tailored hydrogel designs to combat specific contaminants, courtesy of ML-based Bayesian optimization.
Collapse
Affiliation(s)
- Jing Zhang
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Kaixing Fu
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China
| | - Dawei Wang
- Key Laboratory of Integrated Regulation and Resource Development on Shallow Lake of Ministry of Education, College of Environment, Hohai University, Nanjing 210098, PR China
| | - Shiqing Zhou
- Hunan Engineering Research Center of Water Security Technology and Application, College of Civil Engineering, Hunan University, Changsha 410082, PR China
| | - Jinming Luo
- State Environmental Protection Key Laboratory of Environmental Health Impact Assessment of Emerging Contaminants, School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, PR China.
| |
Collapse
|
21
|
Han Z, Shen Z, Pei J, You Q, Zhang Q, Wang L. Transformation of peptides to small molecules in medicinal chemistry: Challenges and opportunities. Acta Pharm Sin B 2024; 14:4243-4265. [PMID: 39525591 PMCID: PMC11544290 DOI: 10.1016/j.apsb.2024.06.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/14/2024] [Accepted: 06/11/2024] [Indexed: 11/16/2024] Open
Abstract
Peptides are native binders involved in numerous physiological life procedures, such as cellular signaling, and serve as ready-made regulators of biochemical processes. Meanwhile, small molecules compose many drugs owing to their outstanding advantages of physiochemical properties and synthetic convenience. A novel field of research is converting peptides into small molecules, providing a convenient portable solution for drug design or peptidomic research. Endowing properties of peptides onto small molecules can evolutionarily combine the advantages of both moieties and improve the biological druggability of molecules. Herein, we present eight representative recent cases in this conversion and elaborate on the transformation process of each case. We discuss the innovative technological methods and research approaches involved, and analyze the applicability conditions of the approaches and methods in each case, guiding further modifications of peptides to small molecules. Finally, based on the aforementioned cases, we summarize a general procedure for peptide-to-small molecule modifications, listing the technological methods available for each transformation step and providing our insights on the applicable scenarios for these methods. This review aims to present the progress of peptide-to-small molecule modifications and propose our thoughts and perspectives for future research in this field.
Collapse
Affiliation(s)
- Zeyu Han
- State Key Laboratory of Natural Medicines and Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University, Nanjing 210009, China
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Zekai Shen
- State Key Laboratory of Natural Medicines and Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University, Nanjing 210009, China
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Jiayue Pei
- State Key Laboratory of Natural Medicines and Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University, Nanjing 210009, China
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Qidong You
- State Key Laboratory of Natural Medicines and Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University, Nanjing 210009, China
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Qiuyue Zhang
- State Key Laboratory of Natural Medicines and Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University, Nanjing 210009, China
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Lei Wang
- State Key Laboratory of Natural Medicines and Jiangsu Key Laboratory of Drug Design and Optimization, China Pharmaceutical University, Nanjing 210009, China
- Department of Medicinal Chemistry, School of Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| |
Collapse
|
22
|
Sun D, Macedonia C, Chen Z, Chandrasekaran S, Najarian K, Zhou S, Cernak T, Ellingrod VL, Jagadish HV, Marini B, Pai M, Violi A, Rech JC, Wang S, Li Y, Athey B, Omenn GS. Can Machine Learning Overcome the 95% Failure Rate and Reality that Only 30% of Approved Cancer Drugs Meaningfully Extend Patient Survival? J Med Chem 2024; 67:16035-16055. [PMID: 39253942 DOI: 10.1021/acs.jmedchem.4c01684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Despite implementing hundreds of strategies, cancer drug development suffers from a 95% failure rate over 30 years, with only 30% of approved cancer drugs extending patient survival beyond 2.5 months. Adding more criteria without eliminating nonessential ones is impractical and may fall into the "survivorship bias" trap. Machine learning (ML) models may enhance efficiency by saving time and cost. Yet, they may not improve success rate without identifying the root causes of failure. We propose a "STAR-guided ML system" (structure-tissue/cell selectivity-activity relationship) to enhance success rate and efficiency by addressing three overlooked interdependent factors: potency/specificity to the on/off-targets determining efficacy in tumors at clinical doses, on/off-target-driven tissue/cell selectivity influencing adverse effects in the normal organs at clinical doses, and optimal clinical doses balancing efficacy/safety as determined by potency/specificity and tissue/cell selectivity. STAR-guided ML models can directly predict clinical dose/efficacy/safety from five features to design/select the best drugs, enhancing success and efficiency of cancer drug development.
Collapse
Affiliation(s)
| | | | - Zhigang Chen
- LabBotics.ai, Palo Alto, California 94303, United States
| | | | | | - Simon Zhou
- Aurinia Pharmaceuticals Inc., Rockville, Maryland 20850, United States
| | | | | | | | | | | | | | | | | | - Yan Li
- Translational Medicine and Clinical Pharmacology, Bristol Myers Squibb, Summit, New Jersey 07901, United States
| | | | | |
Collapse
|
23
|
Madushanka A, Laird E, Clark C, Kraka E. SmartCADD: AI-QM Empowered Drug Discovery Platform with Explainability. J Chem Inf Model 2024; 64:6799-6813. [PMID: 39177478 DOI: 10.1021/acs.jcim.4c00720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
Artificial intelligence (AI) has emerged as a pivotal force in enhancing productivity across various sectors, with its impact being profoundly felt within the pharmaceutical and biotechnology domains. Despite AI's rapid adoption, its integration into scientific research faces resistance due to myriad challenges: the opaqueness of AI models, the intricate nature of their implementation, and the issue of data scarcity. In response to these impediments, we introduce SmartCADD, an innovative, open-source virtual screening platform that combines deep learning, computer-aided drug design (CADD), and quantum mechanics methodologies within a user-friendly Python framework. SmartCADD is engineered to streamline the construction of comprehensive virtual screening workflows that incorporate a variety of formerly independent techniques─spanning ADMET property predictions, de novo 2D and 3D pharmacophore modeling, molecular docking, to the integration of explainable AI mechanisms. This manuscript highlights the foundational principles, key functionalities, and the unique integrative approach of SmartCADD. Furthermore, we demonstrate its efficacy through a case study focused on the identification of promising lead compounds for HIV inhibition. By democratizing access to advanced AI and quantum mechanics tools, SmartCADD stands as a catalyst for progress in pharmaceutical research and development, heralding a new era of innovation and efficiency.
Collapse
Affiliation(s)
- Ayesh Madushanka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| | - Eli Laird
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Corey Clark
- Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States
| | - Elfi Kraka
- Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
24
|
Bhattacharya D, Cassady HJ, Hickner MA, Reinhart WF. Large Language Models as Molecular Design Engines. J Chem Inf Model 2024. [PMID: 39231030 DOI: 10.1021/acs.jcim.4c01396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
The design of small molecules is crucial for technological applications ranging from drug discovery to energy storage. Due to the vast design space available to modern synthetic chemistry, the community has increasingly sought to use data-driven and machine learning approaches to navigate this space. Although generative machine learning methods have recently shown potential for computational molecular design, their use is hindered by complex training procedures, and they often fail to generate valid and unique molecules. In this context, pretrained Large Language Models (LLMs) have emerged as potential tools for molecular design, as they appear to be capable of creating and modifying molecules based on simple instructions provided through natural language prompts. In this work, we show that the Claude 3 Opus LLM can read, write, and modify molecules according to prompts, with impressive 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, we systematically evaluate the model's behavior under different prompting conditions. Notably, the model is able to perform guided molecular generation when asked to manipulate the electronic structure of molecules using simple, natural-language prompts. Our findings highlight the potential of LLMs as powerful and versatile molecular design engines.
Collapse
Affiliation(s)
- Debjyoti Bhattacharya
- Materials Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| | - Harrison J Cassady
- Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael A Hickner
- Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, Michigan 48824, United States
| | - Wesley F Reinhart
- Materials Science and Engineering, Pennsylvania State University, University Park, Pennsylvania 16802, United States
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, United States
| |
Collapse
|
25
|
Xiao Z, Zhu M, Chen J, You Z. Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:15650-15660. [PMID: 39051472 DOI: 10.1021/acs.est.4c02421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Accurate prediction of parameters related to the environmental exposure of chemicals is crucial for the sound management of chemicals. However, the lack of large data sets for training models may result in poor prediction accuracy and robustness. Herein, integrated transfer learning (TL) and multitask learning (MTL) was proposed for constructing a graph neural network (GNN) model (abbreviated as TL-MTL-GNN model) using n-octanol/water partition coefficients as a source domain. The TL-MTL-GNN model was trained to predict three bioaccumulation parameters based on enlarged data sets that cover 2496 compounds with at least one bioaccumulation parameter. Results show that the TL-MTL-GNN model outperformed single-task GNN models with and without the TL, as well as conventional machine learning models trained with molecular descriptors or fingerprints. Applicability domains were characterized by a state-of-the-art structure-activity landscape-based (abbreviated as ADSAL) methodology. The TL-MTL-GNN model coupled with the optimal ADSAL was employed to predict bioaccumulation parameters for around 60,000 chemicals, with more than 13,000 compounds identified as bioaccumulative chemicals. The high predictive accuracy and robustness of the TL-MTL-GNN model demonstrate the feasibility of integrating the TL and MTL strategy in modeling small-sized data sets. The strategy holds significant potential for addressing small data challenges in modeling environmental chemicals.
Collapse
Affiliation(s)
- Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Minghua Zhu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, College of Environment, Hohai University, Nanjing 210098, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zecang You
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
26
|
Ameta D, Behera L, Chakraborty A, Sandhan T. Predicting odor from vibrational spectra: a data-driven approach. Sci Rep 2024; 14:20321. [PMID: 39223164 PMCID: PMC11369114 DOI: 10.1038/s41598-024-70696-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024] Open
Abstract
This study investigates olfaction, a complex and not well-understood sensory modality. The chemical mechanism behind smell can be described by so far proposed two theories: vibrational and docking theories. The vibrational theory has been gaining acceptance lately but needs more extensive validation. To fill this gap for the first time, we, with the help of data-driven classification, clustering, and Explainable AI techniques, systematically analyze a large dataset of vibrational spectra (VS) of 3018 molecules obtained from the atomistic simulation. The study utlizes image representations of VS using Gramian Angular Fields and Markov Transition Fields, allowing computer vision techniques to be applied for better feature extraction and improved odor classification. Furthermore, we fuse the PCA-reduced fingerprint features with image features, which show additional improvement in classification results. We use two clustering methods, agglomerative hierarchical (AHC) and k-means, on dimensionality reduced (UMAP, MDS, t-SNE, and PCA) VS and image features, which shed further insight into the connections between molecular structure, VS, and odor. Additionally, we contrast our method with an earlier work that employed traditional machine learning on fingerprint features for the same dataset, and demonstrate that even with a representative subset of 3018 molecules, our deep learning model outperforms previous results. This comprehensive and systematic analysis highlights the potential of deep learning in furthering the field of olfactory research while confirming the vibrational theory of olfaction.
Collapse
Affiliation(s)
- Durgesh Ameta
- Indian Knowledge System and Mental Health Applications Centre, Indian Institute of Technology, Mandi, 175005, India
- Indian Knowledge System Centre, ISS, Delhi, 110065, India
| | - Laxmidhar Behera
- Indian Knowledge System and Mental Health Applications Centre, Indian Institute of Technology, Mandi, 175005, India
- Department of Electrical Engineering, Indian Institute of Technology, Kanpur, 208016, India
| | | | - Tushar Sandhan
- Department of Electrical Engineering, Indian Institute of Technology, Kanpur, 208016, India.
| |
Collapse
|
27
|
Bazuhair MA, Alghamdi AA, Baothman O, Afzal M, Alzarea SI, Imam F, Moglad E, Altayb HN. Chemical analogue based drug design for cancer treatment targeting PI3K: integrating machine learning and molecular modeling. Mol Divers 2024; 28:2345-2364. [PMID: 39154146 DOI: 10.1007/s11030-024-10966-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 08/08/2024] [Indexed: 08/19/2024]
Abstract
Cancer is a generic term for a group of disorders defined by uncontrolled cell growth and the potential to invade or spread to other parts of the body. Gene and epigenetic alterations disrupt normal cellular control, leading to abnormal cell proliferation, resistance to cell death, blood vessel development, and metastasis (spread to other organs). One of the several routes that play an important role in the development and progression of cancer is the phosphoinositide 3-kinase (PI3K) signaling pathway. Moreover, the gene PIK3CG encodes the catalytic subunit gamma (p110γ) of phosphoinositide 3-kinase (PI3Kγ), a member of the PI3K family. Therefore, in this study, PIK3CG was targeted to inhibit cancer by identifying a novel inhibitor through computational methods. The study screened 1015 chemical fragments against PIK3CG using machine learning-based binding estimation and docking to select the potential compounds. Later, the analogues were generated from the selected hits, and 414 analogues were selected, which were further screened, and as most potential candidates, three compounds were obtained: (a) 84,332, 190,213, and 885,387. The protein-ligand complex's stability and flexibility were then investigated by dynamic modeling. The 100 ns simulation revealed that 885,387 exhibited the steadiest deviation and constant creation of hydrogen bonds. Compared to the other compounds, 885,387 demonstrated a superior binding free energy (ΔG = -18.80 kcal/mol) with the protein when the MM/GBSA technique was used. The study determined that 885,387 showed significant therapeutic potential and justifies further experimental investigation as a possible inhibitor of the PIK3CG target implicated in cancer.
Collapse
Affiliation(s)
- Mohammed A Bazuhair
- Department of Clinical Pharmacology Faculty of Medicine King, Abdulaziz University, 21589, Jeddah, Saudi Arabia
- Centre of Research Excellence for Drug Research and Pharmaceutical Industries, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Anwar A Alghamdi
- Health Information Technology Department, The Applied College; Pharmacovigilance and Medication Safety Unit, Centre of Research Excellence for Drug Research and Pharmaceutical Industries, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Baothman
- Department of Biochemistry, Faculty of Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia
| | - Muhammad Afzal
- Department of Pharmaceutical Sciences, Batterjee Medical College, Pharmacy Program, P.O. Box 6231, 21442, Jeddah, Saudi Arabia
| | - Sami I Alzarea
- Department of Pharmacology, College of Pharmacy, Jouf University, 72341, Aljouf, Sakaka, Saudi Arabia
| | - Faisal Imam
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, P.O. Box 2457, 11451, Riyadh, Saudi Arabia
| | - Ehssan Moglad
- Department of Pharmaceutics, College of Pharmacy, Prince Sattam Bin Abdulaziz University, P.O. Box 173, 11942, Alkharj, Saudi Arabia
| | - Hisham N Altayb
- Department of Biochemistry, Faculty of Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia.
| |
Collapse
|
28
|
Wang S, Yue H, Yuan X. Accelerating Polymer Discovery with Uncertainty-Guided PGCNN: Explainable AI for Predicting Properties and Mechanistic Insights. J Chem Inf Model 2024; 64:5500-5509. [PMID: 38953249 DOI: 10.1021/acs.jcim.4c00555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
Deep learning holds great potential for expediting the discovery of new polymers from the vast chemical space. However, accurately predicting polymer properties for practical applications based on their monomer composition has long been a challenge. The main obstacles include insufficient data, ineffective representation encoding, and lack of explainability. To address these issues, we propose an interpretable model called the Polymer Graph Convolutional Neural Network (PGCNN) that can accurately predict various polymer properties. This model is trained using the RadonPy data set and validated using experimental data samples. By integrating evidential deep learning with the model, we can quantify the uncertainty of predictions and enable sample-efficient training through uncertainty-guided active learning. Additionally, we demonstrate that the global attention of the graph embedding can aid in discovering underlying physical principles by identifying important functional groups within polymers and associating them with specific material attributes. Lastly, we explore the high-throughput screening capability of our model by rapidly identifying thousands of promising candidates with low and high thermal conductivity from a pool of one million hypothetical polymers. In summary, our research not only advances our mechanistic understanding of polymers using explainable AI but also paves the way for data-driven trustworthy discovery of polymer materials.
Collapse
Affiliation(s)
- Shuyu Wang
- Department of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| | - Hongxing Yue
- Department of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| | - Xiaoming Yuan
- Xiaoming Yuan - Department of Computer Science and Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, Hebei 066000, China
| |
Collapse
|
29
|
Tahıl G, Delorme F, Le Berre D, Monflier É, Sayede A, Tilloy S. Stereoisomers Are Not Machine Learning's Best Friends. J Chem Inf Model 2024; 64:5451-5469. [PMID: 38949069 DOI: 10.1021/acs.jcim.4c00318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES that can distinguish stereoisomers. However, such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information on molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared to our original machine learning task: predicting the association constant between cyclodextrin and a guest molecule.
Collapse
Affiliation(s)
- Gökhan Tahıl
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Fabien Delorme
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
| | - Daniel Le Berre
- Centre de Recherche en Informatique de Lens (CRIL)Univ. Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France
| | - Éric Monflier
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Adlane Sayede
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| | - Sébastien Tilloy
- Univ. Artois, CNRS, Centrale Lille, Univ. Lille, UMR 8181, Unité de Catalyse et Chimie du Solide (UCCS), rue Jean Souvraz, SP 18, F-62307 Lens Cedex, France
| |
Collapse
|
30
|
Huang Y, Zhong S, Gan L, Chen Y. Development of Machine Learning Models for Ion-Selective Electrode Cation Sensor Design. ACS ES&T ENGINEERING 2024; 4:1702-1711. [PMID: 39021402 PMCID: PMC11250033 DOI: 10.1021/acsestengg.4c00087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/15/2024] [Accepted: 03/15/2024] [Indexed: 07/20/2024]
Abstract
Polyvinyl chloride (PVC) membrane-based ion-selective electrode (ISE) sensors are common tools for water assessments, but their development relies on time-consuming and costly experimental investigations. To address this challenge, this study combines machine learning (ML), Morgan fingerprint, and Bayesian optimization technologies with experimental results to develop high-performance PVC-based ISE cation sensors. By using 1745 data sets collected from 20 years of literature, appropriate ML models are trained to enable accurate prediction and a deep understanding of the relationship between ISE components and sensor performance (R 2 = 0.75). Rapid ionophore screening is achieved using the Morgan fingerprint based on atomic groups derived from ML model interpretation. Bayesian optimization is then applied to identify optimal combinations of ISE materials with the potential to deliver desirable ISE sensor performance. Na+, Mg2+, and Al3+ sensors fabricated from Bayesian optimization results exhibit excellent Nernst slopes with less than 8.2% deviation from the ideal value and superb detection limits at 10-7 M level based on experimental validation results. This approach can potentially transform sensor development into a more time-efficient, cost-effective, and rational design process, guided by ML-based techniques.
Collapse
Affiliation(s)
- Yuankai Huang
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Shifa Zhong
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Department
of Environmental Science, School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, China
| | - Lan Gan
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Yongsheng Chen
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
31
|
Ivashchenko SD, Shulga DA, Ivashchenko VD, Zinovev EV, Vlasov AV. In silico studies of the open form of human tissue transglutaminase. Sci Rep 2024; 14:15981. [PMID: 38987418 PMCID: PMC11236986 DOI: 10.1038/s41598-024-66348-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 07/01/2024] [Indexed: 07/12/2024] Open
Abstract
Human tissue transglutaminase (tTG) is an intriguing multifunctional enzyme involved in various diseases, including celiac disease and neurological disorders. Although a number of tTG inhibitors have been developed, the molecular determinants governing ligand binding remain incomplete due to the lack of high-resolution structural data in the vicinity of its active site. In this study, we obtained the complete high-resolution model of tTG by in silico methods based on available PDB structures. We discovered significant differences in the active site architecture between our and known tTG models, revealing an additional loop which affects the ligand binding affinity. We assembled a library of new potential tTG inhibitors based on the obtained complete model of the enzyme. Our library substantially expands the spectrum of possible drug candidates targeting tTG and encompasses twelve molecular scaffolds, eleven of which are novel and exhibit higher binding affinity then already known ones, according to our in silico studies. The results of this study open new directions for structure-based drug design of tTG inhibitors, offering the complete protein model and suggesting a wide range of new compounds for further experimental validation.
Collapse
Affiliation(s)
- S D Ivashchenko
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia, 141701
- Laboratory of Microbiology, BIOTECH University, Moscow, Russia, 125080
| | - D A Shulga
- Department of Chemistry, Moscow State University, Moscow, Russia, 119991
| | - V D Ivashchenko
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia, 141701
| | - E V Zinovev
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia, 141701
| | - A V Vlasov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia, 141701.
- Laboratory of Microbiology, BIOTECH University, Moscow, Russia, 125080.
- Joint Institute for Nuclear Research, Dubna, Russia, 141980.
| |
Collapse
|
32
|
Shi WJ, Long XB, Xin L, Chen CE, Ying GG. Predicting the new psychoactive substance activity of antitussives and evaluating their ecotoxicity to fish. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 932:172872. [PMID: 38692322 DOI: 10.1016/j.scitotenv.2024.172872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/25/2024] [Accepted: 04/27/2024] [Indexed: 05/03/2024]
Abstract
The misuse of antitussives preparations is a continuing problem in the world, and imply that they might have potential new psychoactive substances (NPS) activity. However, few study focus on their ecological toxicity towards fish. In the present study, the machine learning (ML) methods gcForest and random forest (RF) were employed to predict NPS activity in 30 antitussives. The potential toxic target, mode of action (MOA), acute toxicity and chronic toxicity to fish were further investigated. The results showed that both gcForest and RF achieved optimal performance when utilizing combined features of molecular fingerprint (MF) and molecular descriptor (MD), with area under the curve (AUC) = 0.99, accuracy >0.94 and f1 score > 0.94, and were applied to screen the NPS activity in antitussives. A total of 15 antitussives exhibited potential NPS activity, including frequently-used substances like codeine and dextromethorphan. The binding affinity of these antitussives with zebrafish dopamine transporter (zDAT) was high, and even surpassing that of some traditional narcotics and NPS. Some antitussives formed hydrogen bonds or salt bridges with aspartate (Asp) 95, tyrosine (Tyr) 171 of zDAT. For the ecotoxicity, the MOA of these 15 antitussives in fish was predicted as narcosis. The prenoxdiazin, pholcodine, codeine, dextromethorphan and dextrorphan exhibited very toxic/toxic to fish. It was necessary to pay close attention to the ecotoxicity of these antitussives. In this study, the integration of ML, molecular docking and ECOSAR approaches are powerful tools for understanding the toxicity profiles and ecological hazards posed by new pollutants.
Collapse
Affiliation(s)
- Wen-Jun Shi
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China.
| | - Xiao-Bing Long
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Lei Xin
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Chang-Er Chen
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| | - Guang-Guo Ying
- SCNU Environmental Research Institute, Guangdong Provincial Key Laboratory of Chemical Pollution and Environmental Safety & MOE Key Laboratory of Theoretical Chemistry of Environment, South China Normal University, Guangzhou 510006, China; School of Environment, South China Normal University, University Town, Guangzhou 510006, China
| |
Collapse
|
33
|
Yang M, Zhu JJ, McGaughey AL, Priestley RD, Hoek EMV, Jassby D, Ren ZJ. Machine Learning for Polymer Design to Enhance Pervaporation-Based Organic Recovery. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:10128-10139. [PMID: 38743597 DOI: 10.1021/acs.est.4c00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Pervaporation (PV) is an effective membrane separation process for organic dehydration, recovery, and upgrading. However, it is crucial to improve membrane materials beyond the current permeability-selectivity trade-off. In this research, we introduce machine learning (ML) models to identify high-potential polymers, greatly improving the efficiency and reducing cost compared to conventional trial-and-error approach. We utilized the largest PV data set to date and incorporated polymer fingerprints and features, including membrane structure, operating conditions, and solute properties. Dimensionality reduction, missing data treatment, seed randomness, and data leakage management were employed to ensure model robustness. The optimized LightGBM models achieved RMSE of 0.447 and 0.360 for separation factor and total flux, respectively (logarithmic scale). Screening approximately 1 million hypothetical polymers with ML models resulted in identifying polymers with a predicted permeation separation index >30 and synthetic accessibility score <3.7 for acetic acid extraction. This study demonstrates the promise of ML to accelerate tailored membrane designs.
Collapse
Affiliation(s)
- Meiqi Yang
- Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| | - Jun-Jie Zhu
- Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| | - Allyson L McGaughey
- Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Rodney D Priestley
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Eric M V Hoek
- Department of Civil & Environmental Engineering, University of California Los Angeles, Los Angeles, California 90095, United States
| | - David Jassby
- Department of Civil & Environmental Engineering, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Zhiyong Jason Ren
- Department of Civil and Environmental Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
34
|
Pang W, Chen M, Qin Y. Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning. BMC Bioinformatics 2024; 25:182. [PMID: 38724920 PMCID: PMC11080240 DOI: 10.1186/s12859-024-05669-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/22/2024] [Indexed: 05/13/2024] Open
Abstract
BACKGROUND The prediction of drug sensitivity plays a crucial role in improving the therapeutic effect of drugs. However, testing the effectiveness of drugs is challenging due to the complex mechanism of drug reactions and the lack of interpretability in most machine learning and deep learning methods. Therefore, it is imperative to establish an interpretable model that receives various cell line and drug feature data to learn drug response mechanisms and achieve stable predictions between available datasets. RESULTS This study proposes a new and interpretable deep learning model, DrugGene, which integrates gene expression, gene mutation, gene copy number variation of cancer cells, and chemical characteristics of anticancer drugs to predict their sensitivity. This model comprises two different branches of neural networks, where the first involves a hierarchical structure of biological subsystems that uses the biological processes of human cells to form a visual neural network (VNN) and an interpretable deep neural network for human cancer cells. DrugGene receives genotype input from the cell line and detects changes in the subsystem states. We also employ a traditional artificial neural network (ANN) to capture the chemical structural features of drugs. DrugGene generates final drug response predictions by combining VNN and ANN and integrating their outputs into a fully connected layer. The experimental results using drug sensitivity data extracted from the Cancer Drug Sensitivity Genome Database and the Cancer Treatment Response Portal v2 reveal that the proposed model is better than existing prediction methods. Therefore, our model achieves higher accuracy, learns the reaction mechanisms between anticancer drugs and cell lines from various features, and interprets the model's predicted results. CONCLUSIONS Our method utilizes biological pathways to construct neural networks, which can use genotypes to monitor changes in the state of network subsystems, thereby interpreting the prediction results in the model and achieving satisfactory prediction accuracy. This will help explore new directions in cancer treatment. More available code resources can be downloaded for free from GitHub ( https://github.com/pangweixiong/DrugGene ).
Collapse
Affiliation(s)
- Weixiong Pang
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Ming Chen
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Yufang Qin
- College of Information Technology, Shanghai Ocean University, Hucheng Ring Road, Shanghai, China.
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China.
| |
Collapse
|
35
|
Yang Q, Fan L, Hao E, Hou X, Deng J, Xia Z, Du Z. Construction of An Oral Bioavailability Prediction Model Based on Machine Learning for Evaluating Molecular Modifications. J Pharm Sci 2024; 113:1155-1167. [PMID: 38430955 DOI: 10.1016/j.xphs.2024.02.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 02/26/2024] [Accepted: 02/26/2024] [Indexed: 03/05/2024]
Abstract
OBJECTIVE This study aims to explore the impact of ADME on the Oral Bioavailability (OB) of drugs and to construct a machine learning model for OB prediction. The model is then applied to predict the OB of modified berberine and atenolol molecules to obtain structures with higher OB. METHODS Initially, a drug OB database was established, and corresponding ADME characteristics were obtained. The relationship between ADME and OB was analyzed using machine learning, with Morgan fingerprints serving as molecular descriptors. Compounds from the database were input into Random Forest, XGBoost, CatBoost, and LightGBM machine learning models to train the OB 7prediction model and evaluate its performance. Subsequently, berberine and atenolol were modified using Chemdraw software with ten different substituents for mono-substitution, and chlorine atoms for a full range of double substitutions. The modified molecular structures were converted into the same format as the training set for OB prediction. The predicted OB values of the modified structures of berberine and atenolol were compared. RESULTS An OB database of 386 drugs was obtained. It was found that smaller molecular weight and a higher number of rotatable bonds (ten or less) could potentially lead to higher OB. The four machine learning models were evaluated using MSE, R2 score, MAE, and MFE as metrics, with Random Forest performing the best. The models' predictions for the test set were particularly accurate when OB ranged from 30% to 90%. After mono-substitution and double substitution of berberine and atenolol, the OB of both drugs was significantly improved. CONCLUSIONS This study found that some ADME properties of molecules do not have an absolute impact on OB. The database played a decisive role in the process of the machine learning OB prediction model, and the performance of the model was evaluated based on predictions within a range of strong generalization ability. In most cases, mono-substitution and double substitution were beneficial for enhancing the OB of berberine and atenolol. In summary, this study successfully constructed a machine learning regression prediction model that can accurately predict drug OB, which can guide drug design to achieve higher OB to some extent.
Collapse
Affiliation(s)
- Qi Yang
- School of Pharmacy, Guangxi University of Chinese Medicine, Nanning 530200, China
| | - Lili Fan
- School of Pharmacy, Guangxi University of Chinese Medicine, Nanning 530200, China.
| | - Erwei Hao
- Guangxi Key Laboratory of Efficacy Study on Chinese Materia Medica, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Collaborative Innovation Center for Research on Functional Ingredients of Agricultural Residues, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Key Laboratory of Traditional Chinese Medicine Formulas Theory and Transformation for Damp Diseases, Guangxi University of Chinese Medicine, Nanning 530200, China
| | - Xiaotao Hou
- Guangxi Key Laboratory of Efficacy Study on Chinese Materia Medica, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Collaborative Innovation Center for Research on Functional Ingredients of Agricultural Residues, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Key Laboratory of Traditional Chinese Medicine Formulas Theory and Transformation for Damp Diseases, Guangxi University of Chinese Medicine, Nanning 530200, China
| | - Jiagang Deng
- Guangxi Key Laboratory of Efficacy Study on Chinese Materia Medica, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Collaborative Innovation Center for Research on Functional Ingredients of Agricultural Residues, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Key Laboratory of Traditional Chinese Medicine Formulas Theory and Transformation for Damp Diseases, Guangxi University of Chinese Medicine, Nanning 530200, China
| | - Zhongshang Xia
- Guangxi Key Laboratory of Efficacy Study on Chinese Materia Medica, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Collaborative Innovation Center for Research on Functional Ingredients of Agricultural Residues, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Key Laboratory of Traditional Chinese Medicine Formulas Theory and Transformation for Damp Diseases, Guangxi University of Chinese Medicine, Nanning 530200, China.
| | - Zhengcai Du
- Guangxi Key Laboratory of Efficacy Study on Chinese Materia Medica, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Collaborative Innovation Center for Research on Functional Ingredients of Agricultural Residues, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Key Laboratory of Traditional Chinese Medicine Formulas Theory and Transformation for Damp Diseases, Guangxi University of Chinese Medicine, Nanning 530200, China; Guangxi Scientific Research Center of Traditional Chinese Medicine, Guangxi University of Chinese Medicine, Nanning 530200, China
| |
Collapse
|
36
|
Zhang ZM, Huang Y, Liu G, Yu W, Xie Q, Chen Z, Huang G, Wei J, Zhang H, Chen D, Du H. Development of machine learning-based predictors for early diagnosis of hepatocellular carcinoma. Sci Rep 2024; 14:5274. [PMID: 38438393 PMCID: PMC10912761 DOI: 10.1038/s41598-024-51265-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 01/03/2024] [Indexed: 03/06/2024] Open
Abstract
Hepatocellular carcinoma (HCC) remains a formidable malignancy that significantly impacts human health, and the early diagnosis of HCC holds paramount importance. Therefore, it is imperative to develop an efficacious signature for the early diagnosis of HCC. In this study, we aimed to develop early HCC predictors (eHCC-pred) using machine learning-based methods and compare their performance with existing methods. The enhancements and advancements of eHCC-pred encompassed the following: (i) utilization of a substantial number of samples, including an increased representation of cirrhosis tissues without HCC (CwoHCC) samples for model training and augmented numbers of HCC and CwoHCC samples for model validation; (ii) incorporation of two feature selection methods, namely minimum redundancy maximum relevance and maximum relevance maximum distance, along with the inclusion of eight machine learning-based methods; (iii) improvement in the accuracy of early HCC identification, elevating it from 78.15 to 97% using identical independent datasets; and (iv) establishment of a user-friendly web server. The eHCC-pred is freely accessible at http://www.dulab.com.cn/eHCC-pred/ . Our approach, eHCC-pred, is anticipated to be robustly employed at the individual level for facilitating early HCC diagnosis in clinical practice, surpassing currently available state-of-the-art techniques.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Yuting Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Guanghao Liu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
- Fujian Key Laboratory of Medical Bioinformatics, Department of Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China
| | - Wenqi Yu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Qingsong Xie
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Zixi Chen
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Guanda Huang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Jinfen Wei
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Haibo Zhang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Dong Chen
- Fangrui Institute of Innovative Drugs, South China University of Technology, Guangzhou, China
| | - Hongli Du
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China.
| |
Collapse
|