1
|
Zhu M, Fang Y, Jia M, Chen L, Zhang L, Wu B. Using machine learning models to predict the dose-effect curve of municipal wastewater for zebrafish embryo toxicity. JOURNAL OF HAZARDOUS MATERIALS 2025; 488:137278. [PMID: 39899932 DOI: 10.1016/j.jhazmat.2025.137278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 01/16/2025] [Accepted: 01/17/2025] [Indexed: 02/05/2025]
Abstract
Municipal wastewater substantially contributes to aquatic ecological risks. Assessing the toxicity of municipal wastewater through dose-effect curves is challenging owing to the time-consuming, labor-intensive, and costly nature of biological assays. This study developed machine learning models to predict wastewater dose-effect curves for zebrafish embryos. The influent and effluent samples from 176 wastewater treatment plants in China were analyzed to collect water quality data, including information on seven chemical parameters and the toxic effects on zebrafish embryos at eight relative enrichment factors (REFs) of wastewater. Using Spearman's rank correlation coefficient and the max-relevance and min-redundancy algorithm, the parameters of ammonium nitrogen content and toxic effect values at REFs of 2 and 25 (REF2 and REF25), were identified as crucial input features from 15 variables. Decision tree, random forest, and gradient-boosted decision tree (GBDT) models were developed. Among these, GBDT exhibited the best performance, with an average R2 value of 0.91 and an average mean absolute percentage error (MAPE) of 27.91 %. Integrating the dose-effect curve pattern into the machine learning model considerably optimized the GBDT model, reaching a minimum MAPE of 14.74 %. The developed model can accurately determine the dose-effect curves of actual wastewater, reducing at least 75 % of the experimental workload. These findings provide a valuable tool for assessing zebrafish embryo toxicity in municipal wastewater management. This study indicates that combining environmental expertise and machine learning models allows for a scientific assessment of the potential toxic risks in wastewater, providing new perspectives and approaches for environmental policy development.
Collapse
Affiliation(s)
- Mengyuan Zhu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, PR China
| | - Yushi Fang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, PR China
| | - Min Jia
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, PR China
| | - Ling Chen
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, PR China
| | - Linyu Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, PR China
| | - Bing Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, PR China.
| |
Collapse
|
2
|
Fallani A, Nugmanov R, Arjona-Medina J, Wegner JK, Tkatchenko A, Chernichenko K. Pretraining graph transformers with atom-in-a-molecule quantum properties for improved ADMET modeling. J Cheminform 2025; 17:25. [PMID: 40016793 PMCID: PMC11869672 DOI: 10.1186/s13321-025-00970-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Accepted: 02/07/2025] [Indexed: 03/01/2025] Open
Abstract
We evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features for the modeling of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. We compare this pretraining strategy with two others: one based on molecular quantum properties (specifically the HOMO-LUMO gap) and one using a self-supervised atom masking technique. After fine-tuning on Therapeutic Data Commons ADMET datasets, we evaluate the performance improvement in the different models observing that models pretrained with atomic quantum mechanical properties produce in general better results. We then analyze the latent representations and observe that the supervised strategies preserve the pretraining information after fine-tuning and that different pretrainings produce different trends in latent expressivity across layers. Furthermore, we find that models pretrained on atomic quantum mechanical properties capture more low-frequency Laplacian eigenmodes of the input graph via the attention weights and produce better representations of atomic environments within the molecule. Application of the analysis to a much larger non-public dataset for microsomal clearance illustrates generalizability of the studied indicators. In this case the performances of the models are in accordance with the representation analysis and highlight, especially for the case of masking pretraining and atom-level quantum property pretraining, how model types with similar performance on public benchmarks can have different performances on large scale pharmaceutical data.Scientific contributionWe systematically compared three different data type/methodologies for pretraining molecular Graphormer with the purpose of modeling ADMET properties as downstream tasks. The learned representations from differently pretrained models were analyzed in addition to comparison of downstream task performances that have been typically reported in similar works. Such examination methodologies, including a newly introduced analysis of Graphormer's Attention Rollout Matrix, can guide pretraining strategy selection, as corroborated by a performance evaluation on a larger internal dataset.
Collapse
Affiliation(s)
- Alessio Fallani
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg , Luxembourg
- Drug Discovery Data Sciences, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Ramil Nugmanov
- Drug Discovery Data Sciences, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| | - Jose Arjona-Medina
- Drug Discovery Data Sciences, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Jörg Kurt Wegner
- Johnson & Johnson Innovative Medicine, 301 Binney Street, Cambridge, MA, 02142, USA
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg , Luxembourg
| | - Kostiantyn Chernichenko
- Drug Discovery Data Sciences, Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340, Beerse, Belgium.
| |
Collapse
|
3
|
Xiao F, Ding X, Shi Y, Wang D, Wang Y, Cui C, Zhu T, Chen K, Xiang P, Luo X. Application of ensemble learning for predicting GABA A receptor agonists. Comput Biol Med 2024; 169:107958. [PMID: 38194778 DOI: 10.1016/j.compbiomed.2024.107958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 12/29/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]
Abstract
BACKGROUND Over the past few decades, agonists binding to the benzodiazepine site of the GABAA receptor have been successfully developed as clinical drugs. Different modulators (agonist, antagonist, and reverse agonist) bound to benzodiazepine sites exhibit different or even opposite pharmacological effects, however, their structures are so similar that it is difficult to distinguish them based solely on molecular skeleton. This study aims to develop classification models for predicting the agonists. METHODS 306 agonists or non-agonists were collected from literature. Six machine learning algorithms including RF, XGBoost, AdaBoost, GBoost, SVM, and ANN algorithms were employed for model development. Using six descriptors including 1D/2D Descriptors, ECFP4, 2D-Pharmacophore, MACCS, PubChem, and Estate fingerprint to characterize chemical structures. The model interpretability was explored by SHAP method. RESULTS The best model demonstrated an AUC value of 0.905 and an MCC value of 0.808 for the test set. The PubMac-based model (PubMac-GB) achieved best AUC values of 0.935 for test set. The SHAP analysis results emphasized that MaccsFP62, ECFP_624, ECFP_724, and PubchemFP213 were the crucial molecular features. Applicability domain analysis was also performed to determine reliable prediction boundaries for the model. The PubMac-GB model was applied to virtual screening for potential GABAA agonists and the top 100 compounds were given. CONCLUSION Overall, our ensemble learning-based model (PubMac-GB) achieved comparable performance and would be helpful in effectively identifying agonists of GABAA receptors.
Collapse
Affiliation(s)
- Fu Xiao
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yan Shi
- Academy of Forensic Science, Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, 200063, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Tingfei Zhu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Kaixian Chen
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China
| | - Ping Xiang
- Academy of Forensic Science, Shanghai Key Laboratory of Forensic Medicine, Shanghai Forensic Service Platform, Key Laboratory of Forensic Science, Ministry of Justice, Shanghai, 200063, China.
| | - Xiaomin Luo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing, 210023, China; Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing, 100049, China.
| |
Collapse
|
4
|
Hao H, Li P, Jiao W, Ge D, Hu C, Li J, Lv Y, Chen W. Ensemble learning-based applied research on heavy metals prediction in a soil-rice system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 898:165456. [PMID: 37451444 DOI: 10.1016/j.scitotenv.2023.165456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 07/06/2023] [Accepted: 07/08/2023] [Indexed: 07/18/2023]
Abstract
Accurate prediction of heavy metal accumulation in soil ecosystems is crucial for maintaining healthy soil environments and ensuring high-quality agricultural products, as well as a challenging scientific task. In this study, we constructed a dataset containing 490 sets of multidimensional environmental covariate data and proposed prediction models for heavy metal concentrations (HMC) in a soil-rice system, EL-HMC (including RF-HMC and GBM-HMC), based on Random Forest (RF) and Gradient Boosting Machine (GBM) ensemble learning (EL) techniques. To reasonably evaluate the effectiveness of each model, Multiple linear and Bayesian regressions were selected as benchmark models (BM), and mean absolute error (MAE), root mean square error (RMSE), and determination coefficient R2 were selected as evaluation indicators. In addition, sensitivity and spatial autocorrelation (SAC) analyses were used to examine the robustness of the model. The results showed that the R2 values of RF-HMC and GBM-HMC for modeling available cadmium (Cd) concentrations in soil were 0.654 and 0.690, respectively, with an average increase of 48.0 % compared to the BMs. The R2 values of RF-HMC and GBM-HMC for predicting Cd, lead (Pb), chromium (Cr), and mercury (Hg) concentrations in rice ranged from 0.618 to 0.824 and 0.645 to 0.850, respectively, with an average increase of 58.2 % compared with the BMs. The corresponding MAEs and RMSEs of RF-HMC and GBM-HMC had low error levels. Sensitivity analysis of the input features and the SAC of the prediction bias showed that the EL-HMC models have excellent robustness. Therefore, the EL technology-based prediction models for HMCs proposed herein are practical and feasible, demonstrating better accuracy and stability than the traditional model. This study verifies the application potential of EL technology in pollution ecology and provides a new perspective and solution for sustainable management and precise prevention of heavy metal pollution in farmland soil at the regional scale.
Collapse
Affiliation(s)
- Huijuan Hao
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China
| | - Panpan Li
- Information Centre, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, PR China.
| | - Wentao Jiao
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, PR China.
| | - Dabing Ge
- College of Resources and Environment, Hunan Agricultural University, Changsha 410128, PR China
| | - Chengwei Hu
- Information Centre, PLA Strategic Support Force Characteristic Medical Center, Beijing 100101, PR China
| | - Jing Li
- Department of Oncology, Huludao Central Hospital, Huludao 125001, PR China
| | - Yuntao Lv
- Risk assessment Laboratory for Environmental Factors of Agro-product Quality Safety, Ministry of Agriculture and villages, Changsha 410005, PR China
| | - Wanming Chen
- Risk assessment Laboratory for Environmental Factors of Agro-product Quality Safety, Ministry of Agriculture and villages, Changsha 410005, PR China
| |
Collapse
|
5
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|