1
|
Ou J, Wen J, Tan W, Luo X, Cai J, He X, Zhou L, Yuan Y. A data-driven approach for understanding the structure dependence of redox activity in humic substances. ENVIRONMENTAL RESEARCH 2023; 219:115142. [PMID: 36566968 DOI: 10.1016/j.envres.2022.115142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 12/03/2022] [Accepted: 12/17/2022] [Indexed: 06/17/2023]
Abstract
Humic substances (HS) can facilitate electron transfer during biogeochemical processes due to their redox properties, but the structure-redox activity relationships are still difficult to describe and poorly understood. Herein, the linear (Partial Least Squares regressions; PLS) and nonlinear (artificial neural network; ANN) models were applied to monitor the structure dependence of HS redox activities in terms of electron accepting (EAC), electron donating (EDC) and overall electron transfer capacities (ETC) using its physicochemical features as input variables. The PLS model exhibited a moderate ability with R2 values of 0.60, 0.53 and 0.65 to evaluate EAC, EDC and ETC, respectively. The variable influence in the projection (VIP) scores of the PLS identified that the phenols, quinones and aromatic systems were particularly important for describing the redox activities of HS. Compared with the PLS model, the back-propagation ANN model achieved higher performance with R2 values of 0.81, 0.65 and 0.78 for monitoring the EAC, EDC and ETC, respectively. Sensitivity analysis of the ANN separately identified that the EAC highly depended on quinones, aromatics and protein-like fluorophores, while the EDC depended on phenols, aromatics and humic-like fluorophores (or stable free radicals). Additionally, carboxylic groups were the best indicator for evaluating both the EAC and EDC. Good model performances were obtained from the selected features via the PLS and sensitivity analysis, further confirming the accuracy of describing the structure-redox activity relationships with these analyses. This study provides a potential approach for identifying the structure-activity relationships of HS and an efficient machine-learning model for predicting HS redox activities.
Collapse
Affiliation(s)
- Jiajun Ou
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
| | - Junlin Wen
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong University of Technology, Guangzhou, 510006, China; School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou, 510006, China.
| | - Wenbing Tan
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China; State Environmental Protection Key Laboratory of Simulation and Control of Groundwater Pollution, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Xiaoshan Luo
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong University of Technology, Guangzhou, 510006, China; School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou, 510006, China
| | - Jiexuan Cai
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong University of Technology, Guangzhou, 510006, China; School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou, 510006, China
| | - Xiaosong He
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China; State Environmental Protection Key Laboratory of Simulation and Control of Groundwater Pollution, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Lihua Zhou
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Guangzhou 510006, China
| | - Yong Yuan
- Guangdong Key Laboratory of Environmental Catalysis and Health Risk Control, Guangdong University of Technology, Guangzhou, 510006, China; School of Environmental Science and Engineering, Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
2
|
Nguyen KTN, François B, Balasubramanian H, Dufour A, Brown C. Prediction of water quality extremes with composite quantile regression neural network. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:284. [PMID: 36625976 DOI: 10.1007/s10661-022-10870-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 12/17/2022] [Indexed: 06/17/2023]
Abstract
Water quality extremes, which water quality models often struggle to predict, are a grave concern to water supply facilities. Most existing water quality models use mean error functions to maximize the predictability of water quality mean value. This paper describes a composite quantile regression neural network (CQRNN) model, which simultaneously estimates non-crossing regression quantiles by minimizing the composite quantile regression error function. This method can improve the prediction of extremes. This paper evaluates the performance of CQRNN for predicting extreme values of turbidity and total organic carbon (TOC) and compares with quantile regression (QR), linear regression (LR), and k-nearest neighbors (KNN) in an application to the Hetch Hetchy Regional Water System, which is the primary water supply for San Francisco, CA. CQRNN is superior to QR, LR, and KNN for predicting the mean trend and extremes of turbidity and TOC, especially for the non-Gaussian turbidity data. The performance of CQRNN is the most stable relative to other methods over different training sample sizes.
Collapse
Affiliation(s)
- Khanh Thi Nhu Nguyen
- Department of Civil and Environmental Engineering, University of Massachusetts Amherst, 130 Natural Resources Road, Amherst, MA, 01003-9303, USA.
| | - Baptiste François
- Department of Civil and Environmental Engineering, University of Massachusetts Amherst, 130 Natural Resources Road, Amherst, MA, 01003-9303, USA
| | - Hari Balasubramanian
- Department of Mechanical and Industrial Engineering, University of Massachusetts Amherst, 160 Governors Drive, Amherst, MA, 01003-2210, USA
| | - Alexis Dufour
- Climate Risk and Resilience, WSP, 1600 Boulevard René-Lévesque West, 11th Floor, Québec, H3H 1P9, Montréal, Canada
| | - Casey Brown
- Department of Civil and Environmental Engineering, University of Massachusetts Amherst, 130 Natural Resources Road, Amherst, MA, 01003-9303, USA
| |
Collapse
|
3
|
Valenca R, Garcia L, Espinosa C, Flor D, Mohanty SK. Can water composition and weather factors predict fecal indicator bacteria removal in retention ponds in variable weather conditions? THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 838:156410. [PMID: 35662595 DOI: 10.1016/j.scitotenv.2022.156410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/16/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Retention ponds provide benefits including flood control, groundwater recharge, and water quality improvement, but changes in weather conditions could limit the effectiveness in improving microbial water quality metrics. The concentration of fecal indicator bacteria (FIB), which is used as regulatory standards to assess microbial water quality in retention ponds, could vary widely based on many factors including local weather and influent water chemistry and composition. In this critical review, we analyzed 7421 data collected from 19 retention ponds across North America listed in the International Stormwater BMP Database to examine if variable FIB removal in the field conditions can be predicted based on changes in these weather and water composition factors. Our analysis confirms that FIB removal in retention ponds is sensitive to weather conditions or seasons, but temperature and precipitation data may not describe the variable FIB removal. These weather conditions affect suspended solid and nutrient concentrations, which in turn could affect FIB concentration in the ponds. Removal of total suspended solids and total P only explained 5% and 12% of FIB removal data, respectively, and TN removal had no correlation with FIB removal. These results indicate that regression-based modeling with a single parameter as input has limited use to predict FIB removal due to the interactive nature of their effects on FIB removal. In contrast, machine learning algorithms such as the random forest method were able to predict 65% of the data. The overall analysis indicates that the machine learning model could play a critical role in predicting microbial water quality of surface waters under complex conditions where the variation of both water composition and weather conditions could deem regression-based modeling less effective.
Collapse
Affiliation(s)
- Renan Valenca
- Department of Civil and Environmental Engineering, University of California Los Angeles, CA, USA.
| | - Lilly Garcia
- Department of Civil and Environmental Engineering, University of California Los Angeles, CA, USA
| | - Christina Espinosa
- Department of Civil and Environmental Engineering, University of California Los Angeles, CA, USA
| | - Dilara Flor
- Department of Civil and Environmental Engineering, University of California Los Angeles, CA, USA
| | - Sanjay K Mohanty
- Department of Civil and Environmental Engineering, University of California Los Angeles, CA, USA.
| |
Collapse
|
4
|
Prediction of Nitrate and Phosphorus Concentrations Using Machine Learning Algorithms in Watersheds with Different Landuse. WATER 2021. [DOI: 10.3390/w13213096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Rapid industrialization and population growth have elevated the concerns over water quality. Excessive nitrates and phosphates in the water system have an adverse effect on the aquatic ecosystem. In recent years, machine learning (ML) algorithms have been extensively employed to estimate water quality over traditional methods. In this study, the performance of nine different ML algorithms is evaluated to predict nitrate and phosphorus concentration for five different watersheds with different land-use practices. The land-use distribution affects the model performance for all methods. In urban watersheds, the regular and predictable nature of nitrate concentration from wastewater treatment plants results in more accurate estimates. For the nitrate prediction, ANN outperforms other ML models for the urban and agricultural watersheds, while RT-BO performs well for the forested Grand watershed. For the total phosphorus prediction, ensemble-BO and M-SVM outperform other ML models for the agricultural and forested watershed, while the ANN performs better than other ML models for the urban Cuyahoga watershed. In predicting phosphorus concentration, the model predictability is better for agricultural and forested watersheds. Regarding consistency, Bayesian optimized RT, ensemble, and GPR consistently yielded good performance for all watersheds. The methodology and results outlined in this study will assist policymakers in accurately predicting nitrate and phosphorus concentration which will be instrumental in drafting a proper plan to deal with the problem of water pollution.
Collapse
|
5
|
Li S, Bhattarai R, Cooke RA, Verma S, Huang X, Markus M, Christianson L. Relative performance of different data mining techniques for nitrate concentration and load estimation in different type of watersheds. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2020; 263:114618. [PMID: 33618470 DOI: 10.1016/j.envpol.2020.114618] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 04/14/2020] [Accepted: 04/14/2020] [Indexed: 06/12/2023]
Abstract
The increasing availability of water quality datasets has led to a greater focus on hydrologic and water quality analysis, thus requiring more efficient and accurate modelling methods. Data mining techniques have been increasingly used for water quality analysis and prediction of the concentration and load of nitrogen pollutants instead of more traditional simulation methods. In this study, we tested the multilayer perceptron (MLP), k-nearest neighbor (k-NN), random forest, and reduced error pruning tree (REPTree) methods, along with the traditional linear regression, to predict nitrate levels based on long-term data from six watersheds with different land-use practices in the midwestern United States. Both the concentration and load results indicated that REPTree had the best performance, with an R2 of 0.61-0.85 and a relative absolute error of <75.8%. The different watershed types, however, influenced the performance of the data mining methods, where all four methods showed a higher accuracy for urban dominant watershed and lower accuracy for agricultural and forest watersheds. Out of these four methods, classification tree methods (REPTree and RF) performed better than cluster methods (MLP and k-NN) for agricultural and forested watersheds. Our results indicated that both the data structure based on the dominant land use and type of algorithmic method should be carefully considered for selecting a data mining method to predict nitrate concentration and load for a watershed.
Collapse
Affiliation(s)
- Shiyang Li
- College of Environmental Science and Engineering, State Key Laboratory of Pollution Control and Resource Reuse, Ministry of Education Key Laboratory of Yangtze River Water Environment, Tongji University, Shanghai, 200092, People's Republic of China
| | - Rabin Bhattarai
- Department of Agricultural and Biological Engineering, University of Illinois at Urbana Champaign, 1304 W Pennsylvania Ave #338, Urbana, IL, 61801, USA.
| | - Richard A Cooke
- Department of Agricultural and Biological Engineering, University of Illinois at Urbana Champaign, 1304 W Pennsylvania Ave #338, Urbana, IL, 61801, USA
| | - Siddhartha Verma
- Department of Agricultural and Biological Engineering, University of Illinois at Urbana Champaign, 1304 W Pennsylvania Ave #338, Urbana, IL, 61801, USA
| | - Xiangfeng Huang
- College of Environmental Science and Engineering, State Key Laboratory of Pollution Control and Resource Reuse, Ministry of Education Key Laboratory of Yangtze River Water Environment, Tongji University, Shanghai, 200092, People's Republic of China
| | - Momcilo Markus
- Prairie Research Institute, Illinois State Water Survey, 2204 Griffith Dr., Champaign, IL, 61820, USA
| | - Laura Christianson
- Department of Crop Sciences, University of Illinois at Urbana Champaign, AW-101 Turner Hall, 1102 South Goodwin Avenue, Urbana, IL, 61801, USA
| |
Collapse
|
6
|
Sotomayor G, Hampel H, Vázquez RF. Water quality assessment with emphasis in parameter optimisation using pattern recognition methods and genetic algorithm. WATER RESEARCH 2018; 130:353-362. [PMID: 29248805 DOI: 10.1016/j.watres.2017.12.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 12/06/2017] [Accepted: 12/07/2017] [Indexed: 06/07/2023]
Abstract
A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting a large complex matrix of water quality (WQ) data collected during five years (2008, 2010-2013) in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was carried out to identify classes of sampling stations regarding their associated WQ status by considering three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Caliński-Harabasz. As a result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA algorithm was applied on the available data to construct a classification model with the two WQ classes, previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical and microbiological parameters being the independent ones. This algorithm led to a significant reduction of the multidimensional space of independent variables to only nine, which are likely to explain most of the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand and turbidity. Further, the land use cover of the study basin revealed a very good agreement with the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main results of the used WQ data mining approach.
Collapse
Affiliation(s)
- Gonzalo Sotomayor
- Laboratorio de Ecología Acuática, Departamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Av. 12 de Abril S/N, Cuenca, Ecuador.
| | - Henrietta Hampel
- Facultad de Ciencias Químicas, Universidad de Cuenca, Av. 12 de Abril S/N, Cuenca, Ecuador; Laboratorio de Ecología Acuática, Departamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Av. 12 de Abril S/N, Cuenca, Ecuador.
| | - Raúl F Vázquez
- Facultad de Ingeniería, Universidad de Cuenca, Av. 12 de Abril S/N, Cuenca, Ecuador; Laboratorio de Ecología Acuática, Departamento de Recursos Hídricos y Ciencias Ambientales, Universidad de Cuenca, Av. 12 de Abril S/N, Cuenca, Ecuador.
| |
Collapse
|
7
|
Classification of water quality status based on minimum quality parameters: application of machine learning techniques. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s40808-017-0406-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Suchetana B, Rajagopalan B, Silverstein J. Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017; 598:249-257. [PMID: 28441603 DOI: 10.1016/j.scitotenv.2017.03.236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Revised: 03/08/2017] [Accepted: 03/25/2017] [Indexed: 05/13/2023]
Abstract
A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future.
Collapse
Affiliation(s)
- Bihu Suchetana
- Department of Civil, Environmental and Architectural Engineering, University of Colorado, 428 UCB, Boulder, CO 80309, USA.
| | - Balaji Rajagopalan
- Department of Civil, Environmental and Architectural Engineering, University of Colorado, 428 UCB, Boulder, CO 80309, USA; Cooperative Institute for Research in Environmental Sciences, University of Colorado, CIRES Building, Rm 318, Boulder, CO 80309, USA
| | - JoAnn Silverstein
- Department of Civil, Environmental and Architectural Engineering, University of Colorado, 428 UCB, Boulder, CO 80309, USA
| |
Collapse
|
9
|
Sakizadeh M. Assessment the performance of classification methods in water quality studies, A case study in Karaj River. ENVIRONMENTAL MONITORING AND ASSESSMENT 2015; 187:573. [PMID: 26275762 DOI: 10.1007/s10661-015-4761-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 07/20/2015] [Indexed: 06/04/2023]
Abstract
To show the performance of classification methods in water quality studies, linear discriminant, and Naïve Bayesian classification methods were applied at nine sampling stations with respect to four parameters including COD, nitrite, nitrate, and total coliforms (selected from ten water quality variables) in Karaj River, Iran. To fulfill the goals of this study, the sampling stations were first separated into two groups using cluster analysis. Rural wastewater was the main source of pollution in the first group, whereas the quality of water in the second group has been degraded mainly by organic and agricultural pollution. In order to have an independent group against which the performance of other classification methods is considered, three cross-validation methods including twofold, leave-one-out, and holdout methods were utilized to retain an independent test set. The results of cross-validation for the linear discriminant analysis show that, except for the leave-one-out method with 11.1 % misclassification error, the overall performance has been the same as that of the training data set. Therefore, it has outperformed compared with that of Naïve Bayesian classification method. However, even though in situations where the correlation coefficient among the parameters is low, the latest method can offer the same performance as that of linear discriminant analysis as well. A sensitivity analysis was implemented using ten water quality variables (pH, COD, EC, TDA, turbidity, nitrate, nitrite, sulfate, TC, and FC) to find the most important variables in the classification of Karaj River showing that turbidity, next to COD, pH, nitrate, and sulfate, have had the most contribution in this field.
Collapse
Affiliation(s)
- Mohamad Sakizadeh
- Department of Environmental Sciences, Faculty of Sciences, Shahid Rajaee Teacher Training University, Shahid Shabanloo Avenue, Lavizan, Tehran, Iran,
| |
Collapse
|