1
|
Li Y, Tao C, Fu D, Jafvert CT, Zhu T. Integrating molecular descriptors for enhanced prediction: Shedding light on the potential of pH to model hydrated electron reaction rates for organic compounds. CHEMOSPHERE 2024; 349:140984. [PMID: 38122944 DOI: 10.1016/j.chemosphere.2023.140984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 12/23/2023]
Abstract
Hydrated electron reaction rate constant (ke-aq) is an important parameter to determine reductive degradation efficiency and to mitigate the ecological risk of organic compounds (OCs). However, OC species morphology and the concentration of hydrated electrons (e-aq) in water vary with pH, complicating OC fate assessment. This study introduced the environmental variable of pH, to develop models for ke-aq for 701 data points using 3 descriptor types: (i) molecular descriptors (MD), (ii) quantum chemical descriptors (QCD), and (iii) the combination of both (MD + QCD). Models were screened using 2 descriptor screening methods (MLR and RF) and 14 machine learning (ML) algorithms. The introduction of QCDs that characterized the electronic structure of OCs greatly improved the performance of models while ensuring the need for fewer descriptors. The optimal model MLR-XGBoost(MD + QCD), which included pH, achieved the most satisfactory prediction: R2tra = 0.988, Q2boot = 0.861, R2test = 0.875 and Q2test = 0.873. The mechanistic interpretation using the SHAP method further revealed that QCDs, polarizability, volume, and pH had a great influence on the reductive degradation of OCs by e-aq. Overall, the electrochemical parameters (QCDs, pH) related to the solvent and solute are of significance and should be considered in any future ML modeling that assesses the fate of OCs in aquatic environment.
Collapse
Affiliation(s)
- Yi Li
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
| | - Cuicui Tao
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
| | - Dafang Fu
- School of Civil Engineering, Southeast University, Nanjing, 210096, China
| | - Chad T Jafvert
- Lyles School of Civil Engineering, and Environmental & Ecological Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China.
| |
Collapse
|
2
|
Zhu T, Zhang Y, Tao C, Chen W, Cheng H. Prediction of organic contaminant rejection by nanofiltration and reverse osmosis membranes using interpretable machine learning models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 857:159348. [PMID: 36228787 DOI: 10.1016/j.scitotenv.2022.159348] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 09/21/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
Efficiency improvement in contaminant removal by nanofiltration (NF) and reverse osmosis (RO) membranes is a multidimensional process involving membrane material selection and experimental condition optimization. It is unrealistic to explore the contributions of diverse influencing factors to the removal rate by trial-and-error experimentation. However, the advanced machine learning (ML) method is a powerful tool to simulate this complex decision-making process. Here, 4 traditional learning algorithms (MLR, SVM, ANN, kNN) and 4 ensemble learning algorithms (RF, GBDT, XGBoost, LightGBM) were applied to predict the removal efficiency of contaminants. Results reported here demonstrate that ensemble models showed significantly better predictive performance than traditional models. More importantly, this study achieved a compelling tradeoff between accuracy and interpretability for ensemble models with an effective model interpretation approach, which revealed the mutual interaction mechanism between the membrane material, contaminants and experimental conditions in membrane separation. Additionally, feature selection was for the first time achieved based on the aforementioned model interpretation method to determine the most important variable influencing the contaminant removal rate. Ultimately, the four ensemble models retrained by the selected variables achieved distinguished prediction performance (R2adj = 92.4 %-99.5 %). MWCO (membrane molecular weight cut-off), McGowan volume of solute (V) and molecular weight (MW) of the compound were demonstrated to be the most important influencing factors in contaminant removal by the NF and RO processes. Overall, the proposed methods in this study can facilitate versatile complex decision-making processes in the environmental field, particularly in contaminant removal by advanced physicochemical separation processes.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
| | - Yu Zhang
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| | - Cuicui Tao
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| | - Wenxuan Chen
- School of Civil Engineering, Southeast University, Nanjing 210096, China
| | - Haomiao Cheng
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| |
Collapse
|
3
|
Zhu T, Chen Y, Tao C. Multiple machine learning algorithms assisted QSPR models for aqueous solubility: Comprehensive assessment with CRITIC-TOPSIS. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 857:159448. [PMID: 36252662 DOI: 10.1016/j.scitotenv.2022.159448] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/06/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
As an essential environmental property, the aqueous solubility quantifies the hydrophobicity of a compound. It could be further utilized to evaluate the ecological risk and toxicity of organic pollutants. Concerned about the proliferation of organic contaminants in water and the associated technical burden, researchers have developed QSPR models to predict aqueous solubility. However, there are no standard procedures or best practices on how to comprehensively evaluate models. Hence, the CRITIC-TOPSIS comprehensive assessment method was first-ever proposed according to a variety of statistical parameters in the environmental model research field. 39 models based on 13 ML algorithms (belonged to 4 tribes) and 3 descriptor screening methods, were developed to calculate aqueous solubility values (log Kws) for organic chemicals reliably and verify the effectiveness of the comprehensive assessment method. The evaluations were carried out for exhibiting better predictive accuracy and external competitiveness of the MLR-1, XGB-1, DNN-1, and kNN-1 models in contrast to other prediction models in each tribe. Further, XGB model based on SRM (XGB-1, C = 0.599) was selected as an optimal pathway for prediction of aqueous solubility. We hope that the proposed comprehensive evaluation approach could act as a promising tool for selecting the optimum environmental property prediction methods.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
| | - Ying Chen
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| | - Cuicui Tao
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| |
Collapse
|
4
|
Tong X, Mohapatra S, Zhang J, Tran NH, You L, He Y, Gin KYH. Source, fate, transport and modelling of selected emerging contaminants in the aquatic environment: Current status and future perspectives. WATER RESEARCH 2022; 217:118418. [PMID: 35417822 DOI: 10.1016/j.watres.2022.118418] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 02/07/2022] [Accepted: 04/04/2022] [Indexed: 06/14/2023]
Abstract
The occurrence of emerging contaminants (ECs), such as pharmaceuticals and personal care products (PPCPs), perfluoroalkyl and polyfluoroalkyl substances (PFASs) and endocrine-disrupting chemicals (EDCs) in aquatic environments represent a major threat to water resources due to their potential risks to the ecosystem and humans even at trace levels. Mathematical modelling can be a useful tool as a comprehensive approach to study their fate and transport in natural waters. However, modelling studies of the occurrence, fate and transport of ECs in aquatic environments have generally received far less attention than the more widespread field and laboratory studies. In this study, we reviewed the current status of modelling ECs based on selected representative ECs, including their sources, fate and various mechanisms as well as their interactions with the surrounding environments in aquatic ecosystems, and explore future development and perspectives in this area. Most importantly, the principles, mathematical derivations, ongoing development and applications of various ECs models in different geographical regions are critically reviewed and discussed. The recommendations for improving data quality, monitoring planning, model development and applications were also suggested. The outcomes of this review can lay down a future framework in developing a comprehensive ECs modelling approach to help researchers and policymakers effectively manage water resources impacted by rising levels of ECs.
Collapse
Affiliation(s)
- Xuneng Tong
- Department of Civil & Environmental Engineering, National University of Singapore, 1 Engineering Drive 2, Singapore 117576, Singapore
| | - Sanjeeb Mohapatra
- NUS Environmental Research Institute, National University of Singapore, 1 Create way, Create Tower, #15-02, Singapore 138602, Singapore
| | - Jingjie Zhang
- NUS Environmental Research Institute, National University of Singapore, 1 Create way, Create Tower, #15-02, Singapore 138602, Singapore; Shenzhen Municipal Engineering Lab of Environmental IoT Technologies, Southern University of Science and Technology, Shenzhen, 518055, China; Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China.
| | - Ngoc Han Tran
- NUS Environmental Research Institute, National University of Singapore, 1 Create way, Create Tower, #15-02, Singapore 138602, Singapore
| | - Luhua You
- NUS Environmental Research Institute, National University of Singapore, 1 Create way, Create Tower, #15-02, Singapore 138602, Singapore
| | - Yiliang He
- School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Karina Yew-Hoong Gin
- Department of Civil & Environmental Engineering, National University of Singapore, 1 Engineering Drive 2, Singapore 117576, Singapore; NUS Environmental Research Institute, National University of Singapore, 1 Create way, Create Tower, #15-02, Singapore 138602, Singapore.
| |
Collapse
|
5
|
Zhu T, Chen W, Gu Y, Jafvert CT, Fu D. Polyethylene-water partition coefficients for polychlorinated biphenyls: Application of QSPR predictions models with experimental validation. WATER RESEARCH 2021; 207:117799. [PMID: 34731669 DOI: 10.1016/j.watres.2021.117799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 10/01/2021] [Accepted: 10/20/2021] [Indexed: 06/13/2023]
Abstract
The water environmental recalcitrance and ecotoxicity caused by polychlorinated biphenyls (PCBs) are international issues of common concern. The partition coefficients with PCBs between low-density polyethylene (LDPE) and water (KPE-w) are significant to assess their environmental transport and/or fate in aquatic environment. Even moderately hydrophobic PCBs, however, possess large KPE-w values, which makes directly experimental measurement labored. Here, based on the combination of quantitative structure-property relationships (QSPRs) and machine-learning algorithms, 10 in-silico models are developed to provide a quick estimate of KPE-w. These models exhibit good goodness-of-fit (R2adj: 0.919-0.975), robustness (Q2LOO: 0.870-0.954) and external prediction performances (Q2ext: 0.880-0.971), providing a speedy feasibility to close data gaps for limited or absent experimental information, especially the RF-2 model. Particularly, an additional experimental verification is performed for models by a rapid and accurate three-phase system (aqueous phase, surfactant micelles and LDPE). The results of the experiments for 16 PCBs show the modeling results agree well with experimental values, within or approaching the residuals of ± 0.3 log unit. Mechanism interpretations imply that the number of chlorine atoms and ortho-substituted chlorines are the great effect parameters for KPE-w. This result also heightens interest in measuring and predicting the KPE-w values of chemicals containing halogen atoms in water.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, P.R.China.
| | - Wenxuan Chen
- School of Civil Engineering, Southeast University, Nanjing, 210096, P.R.China
| | - Yuanyuan Gu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, P.R.China
| | - Chad T Jafvert
- Lyles School of Civil Engineering, and Environmental & Ecological Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Dafang Fu
- School of Civil Engineering, Southeast University, Nanjing, 210096, P.R.China
| |
Collapse
|
6
|
Zhu T, Cao Z, Singh RP, Cheng H, Chen M. In silico prediction of polyethylene-aqueous and air partition coefficients of organic contaminants using linear and nonlinear approaches. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2021; 289:112437. [PMID: 33812149 DOI: 10.1016/j.jenvman.2021.112437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 03/16/2021] [Accepted: 03/18/2021] [Indexed: 06/12/2023]
Abstract
Low-density polyethylene (LDPE) passive sampling is very attractive for use in determining chemicals concentrations. Crucial to the measurement is the coefficient (KPE) describing partitioning between LDPE and environmental matrices. 255, 117 and 190 compounds were collected for the development of datasets in three different matrices, i.e., water, air and seawater, respectively. Further, 3 pp-LFER models and 9 QSPR models based on classical multiple linear regression (MLR) coupled with prevalent nonlinear algorithms (artificial neural network, ANN and support vector machine, SVM) were performed to predict LDPE-water (KPE-W), LDPE-air (KPE-A) and LDPE-seawater (KPE-SW) partition coefficients. These developed models have satisfying predictability (R2adj: 0.805-0.966, 0.963-0.991 and 0.817-0.941; RMSEtra: 0.233-0.565, 0.200-0.406 and 0.260-0.459) and robustness (Q2ext: 0.840-0.943, 0.968-0.984 and 0.797-0.842; RMSEext: 0.308-0.514, 0.299-0.426 and 0.407-0.462) in three datasets (water, air and seawater), respectively. In particular, the reasonable mechanism interpretations revealed that the molecular size, hydrophobicity, polarizability, ionization potential, and molecular stability were the most relevant properties, for governing chemicals partitioning between LDPE and environmental matrices. The application domains (ADs) assessed here exhibited the satisfactory applicability. As such, the derived models can act as intelligent tools to predict unknown KPE values and fill the experimental gaps, which was further beneficial for the construction of enormous and reliable database to facilitate a distinct understanding of the distribution for organic contaminants in total environment.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China.
| | - Zaizhi Cao
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
| | | | - Haomiao Cheng
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
| | - Ming Chen
- School of Civil Engineering, Southeast University, Nanjing, 210096, China
| |
Collapse
|
7
|
Zhu T, Gu L, Chen M, Sun F. Exploring QSPR models for predicting PUF-air partition coefficients of organic compounds with linear and nonlinear approaches. CHEMOSPHERE 2021; 266:128962. [PMID: 33218721 DOI: 10.1016/j.chemosphere.2020.128962] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 11/05/2020] [Accepted: 11/10/2020] [Indexed: 06/11/2023]
Abstract
Partition coefficients are important parameters for measuring the concentration of chemicals by passive sampling devices. Considering the wide application of the polyurethane foam (PUF) in passive air sampling, an attempt for developing several quantitative structure-property relationship (QSPR) models was made in this work, to predict PUF-air partition coefficients (KPUF-air) using linear (multiple linear regression, MLR) and non-linear (artificial neural network, ANN and support vector machine, SVM) methods by machine learning. All of the developed models were performed on a dataset of 170 compounds comprising 9 distinct classes. A series of statistical parameters and validation results showed that models had good prediction ability, robustness and goodness-of-fit. Furthermore, the underlying mechanisms of molecular descriptors emphasized that ionization potential, molecular bond, hydrophilicity, size of molecule and valence electron number had dominating influence on the adsorption process of chemicals. Overall, the obtained models were all established on the extensive applicability domains, and thus can be used as effective tools to predict the KPUF-air of new organic compounds or those have not been synthesized yet which, in turn, could help researchers better understand the mechanistic basis of adsorption behavior of PUF.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China.
| | - Liming Gu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
| | - Ming Chen
- School of Civil Engineering, Southeast University, Nanjing, 210096, China
| | - Feng Sun
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
| |
Collapse
|
8
|
Zhu T, Chen W, Singh RP, Cui Y. Versatile in silico modeling of partition coefficients of organic compounds in polydimethylsiloxane using linear and nonlinear methods. JOURNAL OF HAZARDOUS MATERIALS 2020; 399:123012. [PMID: 32544766 DOI: 10.1016/j.jhazmat.2020.123012] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 05/15/2020] [Accepted: 05/20/2020] [Indexed: 06/11/2023]
Abstract
Environmental fate, behavior and effects of hazardous organic compounds have recently received great attention in diverse environmental phases, including water, atmosphere, soil and sediment. Considering polydimethylsiloxane (PDMS) fibers were validated for the wide application in the determination of partition behavior in passive sampling, in this work, several in silico models were established to predict PDMS-water (KPDMS-w), PDMS-air (KPDMS-a) and PDMS-seawater partition coefficients (KPDMS-sw) of diverse chemicals. This is an attempt to combine conventional linear method and popular nonlinear algorithm for the estimation of partition coefficients between PDMS and different environmental media. All of the developed models showed satisfactory goodness-of-fit with high adjusted correlation coefficient (R2adj) and were validated to be robust, stable and predictable by various internal and external validation techniques, deriving a wide series of statistical checks. Moreover, it was found that hydrophobicity, polarizability, charge distribution and molecular size of compounds contributed significantly to the model development by interpreting the selected descriptors. Based on the broad applicability domains (ADs), the current study provides suitable tools to fill the experimental data gap for other compounds and to help researchers better understand the mechanistic basis of adsorption behavior of PDMS.
Collapse
Affiliation(s)
- Tengyi Zhu
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China.
| | - Wenxuan Chen
- School of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, Jiangsu, China
| | | | - Yanran Cui
- Institute for Integrated Catalysis, Pacific Northwest National Laboratory, P.O. Box 999, Richland, WA 99354, United States
| |
Collapse
|