1
|
Park J, Patel K, Lee WH. Recent advances in algal bloom detection and prediction technology using machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 938:173546. [PMID: 38810749 DOI: 10.1016/j.scitotenv.2024.173546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 05/18/2024] [Accepted: 05/24/2024] [Indexed: 05/31/2024]
Abstract
Harmful algal blooms (HAB) including red tides and cyanobacteria are a significant environmental issue that can have harmful effects on aquatic ecosystems and human health. Traditional methods of detecting and managing algal blooms have been limited by their reliance on manual observation and analysis, which can be time-consuming and costly. Recent advances in machine learning (ML) technology have shown promise in improving the accuracy and efficiency of algal bloom detection and prediction. This paper provides an overview of the latest developments in using ML for algal bloom detection and prediction using various water quality parameters and environmental factors. First, we introduced ML for algal bloom prediction using regression and classification models. Then we explored image-based ML for algae detection by utilizing satellite images, surveillance cameras, and microscopic images. This study also highlights several real-world examples of successful implementation of ML for algal bloom detection and prediction. These examples show how ML can enhance the accuracy and efficiency of detecting and predicting algal blooms, contributing to the protection of aquatic ecosystems and human health. The study also outlines recent efforts to enhance the field applicability of ML models and suggests future research directions. A recent interest in explainable artificial intelligence (XAI) was discussed in an effort to understand the most influencing environmental factors on algal blooms. XAI facilitates interpretations of ML model results, thereby enhancing the models' usability for decision-making in field management and improving their overall applicability in real-world settings. We also emphasize the significance of obtaining high-quality, field-representative data to enhance the efficiency of ML applications. The effectiveness of ML models in detecting and predicting algal blooms can be improved through management strategies for data quality, such as pre-treating missing data and integrating diverse datasets into a unified database. Overall, this paper presents a comprehensive review of the latest advancements in managing algal blooms using ML technology and proposes future research directions to enhance the utilization of ML techniques.
Collapse
Affiliation(s)
- Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University,125, Dongseo-daero, Yuseong-gu, Daejeon 34158, Republic of Korea.
| | - Keval Patel
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, United States.
| | - Woo Hyoung Lee
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, United States.
| |
Collapse
|
2
|
Zhong J, Xiao R, Wang P, Yang X, Lu Z, Zheng J, Jiang H, Rao X, Luo S, Huang F. Identifying influence factors and thresholds of the next day's pollen concentration in different seasons using interpretable machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 935:173430. [PMID: 38782273 DOI: 10.1016/j.scitotenv.2024.173430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 05/19/2024] [Accepted: 05/19/2024] [Indexed: 05/25/2024]
Abstract
The prevalence of pollen allergies is a pressing global issue, with projections suggesting that half of the world's population will be affected by 2050 according to the estimation of the World Health Organization (WHO). Accurately forecasting pollen allergy risks requires identifying key factors and their thresholds for aerosol pollen. To address this, we developed a technical framework combining advanced machine learning and SHapley Additive exPlanations (SHAP) technology, focusing on Beijing. By analyzing meteorological data and vegetation phenology, we identified the factors influencing next-day's pollen concentration (NDP) in Beijing and their thresholds. Our results highlight vegetation phenology data from Synthetic Aperture Radar (SAR), temperature, wind speed, and atmospheric pressure as crucial factors in spring. In contrast, the Normalized Difference Vegetation Index (NDVI), air temperature, and wind speed are significant in autumn. Leveraging SHAP technology, we established season-specific thresholds for these factors. Our study not only confirms previous research but also unveils seasonal variations in the relationship between radar-derived vegetation phenology data and NDP. Additionally, we observe seasonal fluctuations in the influence patterns and threshold values of daily air temperatures on NDP. These insights are pivotal for improving pollen concentration prediction accuracy and managing allergic risks effectively.
Collapse
Affiliation(s)
- Junhong Zhong
- School of Architecture and Urban Planning, Guangdong University of Technology, Guangzhou 510090, China; School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| | - Rongbo Xiao
- School of Architecture and Urban Planning, Guangdong University of Technology, Guangzhou 510090, China; School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China.
| | - Peng Wang
- School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China.
| | - Xiaojun Yang
- Florida State University, Tallahassee 10921, United States
| | - Zongliang Lu
- School of Public Administration, Guangdong University of Finance and Economics, Guangzhou 510320, China
| | - Jiatong Zheng
- School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| | - Haiyan Jiang
- School of Architecture and Urban Planning, Guangdong University of Technology, Guangzhou 510090, China
| | - Xin Rao
- School of Mathematics and Statistics, Guangdong University of Foreign Studies, Guangzhou 510420, China
| | - Shuhua Luo
- School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| | - Fei Huang
- School of Environmental Science and Engineering, Guangdong University of Technology, Guangzhou 510006, China
| |
Collapse
|
3
|
Qin C, Tian Q, Zhou H, Qin Y, Zhou S, Wu Y, Tianjiao E, Duan S, Li Y, Wang X, Chen Z, Zheng G, Feng F. Detecting Muscle Invasion of Bladder Cancer: An Application of Diffusion Kurtosis Imaging Ratio and Vesical Imaging-Reporting and Data System. J Magn Reson Imaging 2024; 60:54-64. [PMID: 37916908 DOI: 10.1002/jmri.29053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 09/27/2023] [Accepted: 09/27/2023] [Indexed: 11/03/2023] Open
Abstract
BACKGROUND Independent factors are needed to supplement vesical imaging-reporting and data system (VI-RADS) to improve its ability to identify muscle invasive bladder cancer (MIBC). PURPOSE To assess the correlation between MIBC and diffusion kurtosis imaging (DKI) ratio, VI-RADS, and other factors (such as tumor location). STUDY TYPE Retrospective. POPULATION Sixty-eight patients (50 males and 18 females; age: 70.1 ± 9.5 years) with bladder urothelial carcinoma. FIELD STRENGTH/SEQUENCE 1.5 T, conventional diffusion-weighted imaging (DWI), and DKI (single shot echo-planar sequence). ASSESSMENT Three radiologists independently measured the diffusion parameters of each bladder cancer (BCa) and obturator internus, including the mean apparent diffusion coefficient (ADCmean), mean kurtosis (MK), and mean diffusion (MD). And the ratio of diffusion parameters between BCa and obturator internus was calculated (diffusion parameter ratio = bladder cancer:obturator internus). Based on the VI-RADS, the target lesions were independently scored. Furthermore, the actual tumor-wall contact length (ACTCL) and absolute tumor-wall contact length (ABTCL) were measured. STATISTICAL TESTS Multicollinearity among independent variables was evaluated using the variance inflation factor (VIF). Multivariable logistic regression analysis was used to determine the independent risk factors of MIBC. The receiver operating characteristic curve was used to evaluate the efficacy of each variable in detecting MIBC. The DeLong test was used to compare the area under the curve (AUC). A P < 0.05 was considered statistically significant. RESULTS MKratio (median: 0.62) and VI-RADS were independent risk factors for MIBC. AUCs for MKratio, VI-RADS, and MKratio combined with VI-RADS in assessing MIBC were 0.895, 0.871, and 0.973, respectively. MKratio combined with VI-RADS was more effective in diagnosing MIBC than VI-RADS alone. DATA CONCLUSIONS MKratio has potential to assist the assessment of MIBC. MKratio can be used as a supplement to VI-RADS for detecting MIBC. LEVEL OF EVIDENCE 4 TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Cai Qin
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Qi Tian
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Hui Zhou
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Yihan Qin
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Siyu Zhou
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Yutao Wu
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Tianjiao E
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Shufeng Duan
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Yueyue Li
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Xiaolin Wang
- Department of Urology Surgery, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Zhigang Chen
- Department of Urology Surgery, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Guihua Zheng
- Department of Pathology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Feng Feng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, China
| |
Collapse
|
4
|
Mallick J, Alqadhi S, Hang HT, Alsubih M. Interpreting optimised data-driven solution with explainable artificial intelligence (XAI) for water quality assessment for better decision-making in pollution management. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024:10.1007/s11356-024-33921-7. [PMID: 38884936 DOI: 10.1007/s11356-024-33921-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 06/03/2024] [Indexed: 06/18/2024]
Abstract
In Saudi Arabia, water pollution and drinking water scarcity pose a major challenge and jeopardise the achievement of sustainable development goals. The urgent need for rapid and accurate monitoring and assessment of water quality requires sophisticated, data-driven solutions for better decision-making in water management. This study aims to develop optimised data-driven models for comprehensive water quality assessment to enable informed decisions that are critical for sustainable water resources management. We used an entropy-weighted arithmetic technique to calculate the Water Quality Index (WQI), which integrates the World Health Organization (WHO) standards for various water quality parameters. Our methodology incorporated advanced machine learning (ML) models, including decision trees, random forests (RF) and correlation analyses to select features essential for identifying critical water quality parameters. We developed and optimised data-driven models such as gradient boosting machines (GBM), deep neural networks (DNN) and RF within the H2O API framework to ensure efficient data processing and handling. Interpretation of these models was achieved through a three-pronged explainable artificial intelligence (XAI) approach: model diagnosis with residual analysis, model parts with permutation-based feature importance and model profiling with partial dependence plots (PDP), accumulated local effects (ALE) plots and individual conditional expectation (ICE) plots. The quantitative results revealed insightful findings: fluoride and residual chlorine had the highest and lowest entropy weights, respectively, indicating their differential effects on water quality. Over 35% of the water samples were categorised as 'unsuitable' for consumption, highlighting the urgency of taking action to improve water quality. Amongst the optimised models, the Random Forest (model 79) and the Deep Neural Network (model 81) proved to be the most effective and showed robust predictive abilities with R2 values of 0.96 and 0.97 respectively for testing dataset. Model profiling as XAI highlighted the significant influence of key parameters such as nitrate, total hardness and pH on WQI predictions. These findings enable targeted water quality improvement measures that are in line with sustainable water management goals. Therefore, our study demonstrates the potential of advanced, data-driven methods to revolutionise water quality assessment in Saudi Arabia. By providing a more nuanced understanding of water quality dynamics and enabling effective decision-making, these models contribute significantly to the sustainable management of valuable water resources.
Collapse
Affiliation(s)
- Javed Mallick
- Department of Civil Engineering, College of Engineering, King Khalid University, P.O. Box: 394, Abha, 61411, Kingdom of Saudi Arabia.
| | - Saeed Alqadhi
- Department of Civil Engineering, College of Engineering, King Khalid University, P.O. Box: 394, Abha, 61411, Kingdom of Saudi Arabia
| | - Hoang Thi Hang
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India
| | - Majed Alsubih
- Department of Civil Engineering, College of Engineering, King Khalid University, P.O. Box: 394, Abha, 61411, Kingdom of Saudi Arabia
| |
Collapse
|
5
|
Wu C, Liang Y, Jiang S, Shi Z. Mechanistic and data-driven perspectives on plant uptake of organic pollutants. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 929:172415. [PMID: 38631647 DOI: 10.1016/j.scitotenv.2024.172415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 04/09/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024]
Abstract
Establishing reliable predictive models for plant uptake of organic pollutants is crucial for environmental risk assessment and guiding phytoremediation efforts. This study compiled an expanded dataset of plant cuticle-water partition coefficients (Kcw), a useful indicator for plant uptake, for 371 data points of 148 unique compounds and various plant species. Quantum/computational chemistry software and tools were utilized to compute various molecular descriptors, aiming to comprehensively characterize the properties and structures of each compound. Three types of models were developed to predict Kcw: a mechanism-driven pp-LFER model, a data-driven machine learning model, and an integrated mechanism-data-driven model. The mechanism-data-driven GBRT-ppLFER model exhibited superior performance, achieving RMSEtrain = 0.133 and RMSEtest = 0.301 while maintaining interpretability. The Shapley Additive Explanation analysis indicated that pp-LFER parameters, ESPI, FwRadicalmax, ExtFP607, and RDF70s are the key factors influencing plant uptake in the GBRT-ppLFER model. Overall, pp-LFER parameter, ESPI, and ExtFP607 show positive effects, while the remaining factors exhibit negative effects. Partial dependency analysis further indicated that plant uptake is not solely determined by individual factors but rather by the combined interactions of multiple factors. Specifically, compounds with ppLFER parameter >4, ESPI > -25.5, 0.098 < FwRadicalmax <0.132, and 2 < RFD70s < 3, are generally more readily taken up by plants. Besides, the predicted Kcw values from the GBRT-ppLFER model were effectively employed to estimate the plant-water partition coefficients and bioconcentration factors across different plant species and growth media (water, sand, and soil), achieving an outstanding performance with an RMSE of 0.497. This study provides effective tools for assessing plant uptake of organic pollutants and deepens our understanding of plant-environment-compound interactions.
Collapse
Affiliation(s)
- Chunya Wu
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China
| | - Yuzhen Liang
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.
| | - Shan Jiang
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China
| | - Zhenqing Shi
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China
| |
Collapse
|
6
|
Liu T, Zhang H, Wu J, Liu W, Fang Y. Wastewater treatment process enhancement based on multi-objective optimization and interpretable machine learning. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 364:121430. [PMID: 38875983 DOI: 10.1016/j.jenvman.2024.121430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 04/22/2024] [Accepted: 06/07/2024] [Indexed: 06/16/2024]
Abstract
Optimization and control of wastewater treatment process (WTP) can contribute to cost reduction and efficiency. A wastewater treatment process multi-objective optimization (WTPMO) framework is proposed in this paper to provide suggestions for decision-making in setting parameters of WTP. Firstly, the prediction models based on Extreme Gradient Boosting (XGB) with Bayesian optimization (BO) are developed for predicting effluent water quality (EQ) and energy consumption (EC) for different influent quality and process parameter settings. Then, the SHapley Additive exPlanations (SHAP) algorithm is used to complement the interpretability of machine learning to quantitatively evaluate the impact of different features on the predicted targets. Finally, the Non-dominated Sorting Genetic Algorithm II (NSGA-II) with the Technique for Ordering Preferences on Similarity of Ideal Solutions (TOPSIS) is introduced to solve and make decisions on the multi-objective optimization problem. The WTPMO applicability is validated on Benchmark Simulation Model 1 (BSM1). The results show that BOXGB achieves accurate prediction for EQ and EC with R2 values of 0.923 and 0.965, respectively, indicating that BO can effectively select the model hyperparameters in XGB. Based on SHAP supplemented the interpretability of the model to fully explain how the influent water quality and decision variables affect the EQ and EC of the WTP. In addition, the optimized process parameters are determined based on NSGA-II and TOPSIS, and the EC optimization rate is 1.552% while guaranteeing water quality compliance. Overall, this research can effectively achieve the optimization of WTP, ensure that the effluent water quality meets the standards while reducing energy consumption, assist Wastewater treatment plants (WWTPs) to achieve more intelligent and efficient operation and maintenance management, and provide strong support for environmental protection and sustainable development goals.
Collapse
Affiliation(s)
- Tianxiang Liu
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Heng Zhang
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Junhao Wu
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Wenli Liu
- National Center of Technology Innovation for Digital Construction, Huazhong University of Science & Technology, Wuhan, Hubei, 430074, China; School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| | - Yihai Fang
- Department of Civil Engineering, Monash University, Clayton, 3800, Victoria, Australia
| |
Collapse
|
7
|
Ortiz-Lopez C, Bouchard C, Rodriguez MJ. Ensemble machine learning using hydrometeorological information to improve modeling of quality parameter of raw water supplying treatment plants. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 362:121378. [PMID: 38838533 DOI: 10.1016/j.jenvman.2024.121378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/03/2024] [Accepted: 06/02/2024] [Indexed: 06/07/2024]
Abstract
Source and raw water quality may deteriorate due to rainfall and river flow events that occur in watersheds. The effects on raw water quality are normally detected in drinking water treatment plants (DWTPs) with a time-lag after these events in the watersheds. Early warning systems (EWSs) in DWTPs require models with high accuracy in order to anticipate changes in raw water quality parameters. Ensemble machine learning (EML) techniques have recently been used for water quality modeling to improve accuracy and decrease variance in the outcomes. We used three decision-tree-based EML models (random forest [RF], gradient boosting [GB], and eXtreme Gradient Boosting [XGB]) to predict two critical parameters for DWTPs, raw water Turbidity and UV absorbance (UV254), using rainfall and river flow time series as predictors. When modeling raw water turbidity, the three EML models (rRF-Tu2=0.87, rGB-Tu2=0.80 and rXGB-Tu2=0.81) showed very good performance metrics. For raw water UV254, the three models (rRF-UV2=0.89, rGB-UV2=0.85 and rXGB-UV2=0.88) again showed very good performance metrics. Results from this study suggest that EML approaches could be used in EWSs to anticipate changes in the quality parameters of raw water and enhance decision-making in DWTPs.
Collapse
Affiliation(s)
- Christian Ortiz-Lopez
- Centre de Recherche en Aménagement et Développement (CRAD), Université Laval, 2325 Allée des Bibliothèques, Québec City, QC, G1V 0A6, Canada.
| | - Christian Bouchard
- Centre de Recherche en Aménagement et Développement (CRAD), Université Laval, 2325 Allée des Bibliothèques, Québec City, QC, G1V 0A6, Canada
| | - Manuel J Rodriguez
- École Supérieure d'Aménagement du Territoire et de Développement Régional (ESAD), Université Laval, 2325 Allée des Bibliothèques, Québec City, QC, G1V 0A6, Canada
| |
Collapse
|
8
|
Hong SM, Morgan BJ, Stocker MD, Smith JE, Kim MS, Cho KH, Pachepsky YA. Using machine learning models to estimate Escherichia coli concentration in an irrigation pond from water quality and drone-based RGB imagery data. WATER RESEARCH 2024; 260:121861. [PMID: 38875854 DOI: 10.1016/j.watres.2024.121861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 05/29/2024] [Accepted: 05/30/2024] [Indexed: 06/16/2024]
Abstract
The rapid and efficient quantification of Escherichia coli concentrations is crucial for monitoring water quality. Remote sensing techniques and machine learning algorithms have been used to detect E. coli in water and estimate its concentrations. The application of these approaches, however, is challenged by limited sample availability and unbalanced water quality datasets. In this study, we estimated the E. coli concentration in an irrigation pond in Maryland, USA, during the summer season using demosaiced natural color (red, green, and blue: RGB) imagery in the visible and infrared spectral ranges, and a set of 14 water quality parameters. We did this by deploying four machine learning models - Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB), and K-nearest Neighbor (KNN) - under three data utilization scenarios: water quality parameters only, combined water quality and small unmanned aircraft system (sUAS)-based RGB data, and RGB data only. To select the training and test datasets, we applied two data-splitting methods: ordinary and quantile data splitting. These methods provided a constant splitting ratio in each decile of the E. coli concentration distribution. Quantile data splitting resulted in better model performance metrics and smaller differences between the metrics for both the training and testing datasets. When trained with quantile data splitting after hyperparameter optimization, models RF, GBM, and XGB had R2 values above 0.847 for the training dataset and above 0.689 for the test dataset. The combination of water quality and RGB imagery data resulted in a higher R2 value (>0.896) for the test dataset. Shapley additive explanations (SHAP) of the relative importance of variables revealed that the visible blue spectrum intensity and water temperature were the most influential parameters in the RF model. Demosaiced RGB imagery served as a useful predictor of E. coli concentration in the studied irrigation pond.
Collapse
Affiliation(s)
- Seok Min Hong
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA; Department of Civil Urban Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, South Korea
| | - Billie J Morgan
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Matthew D Stocker
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Jaclyn E Smith
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Moon S Kim
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Kyung Hwa Cho
- School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, 02841, South Korea.
| | - Yakov A Pachepsky
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA.
| |
Collapse
|
9
|
Yang GG, Wang Q, Feng J, He L, Li R, Lu W, Liao E, Lai Z. Can three-dimensional nitrate structure be reconstructed from surface information with artificial intelligence? - A proof-of-concept study. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171365. [PMID: 38458452 DOI: 10.1016/j.scitotenv.2024.171365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 02/09/2024] [Accepted: 02/27/2024] [Indexed: 03/10/2024]
Abstract
Nitrate is one of the essential variables in the ocean that is a primary control of the upper ocean pelagic ecosystem. Its three-dimensional (3D) structure is vital for understanding the dynamic and ecosystem. Although several gridded nitrate products exist, the possibility of reconstructing the 3D structure of nitrate from surface data has never been exploited. In this study, we employed two advanced artificial intelligence (AI) networks, U-net and Earthformer, to reconstruct nitrate concentration in the Indian Ocean from surface data. Simulation from an ecosystem model was utilized as the labeling data to train and test the AI networks, with wind vectors, wind stress, sea surface temperature, sea surface chlorophyll-a, solar radiation, and precipitation as the input. We compared the performance of two networks and different pre-processing methods. With the input features decomposed into climatology and anomaly components, the Earthformer achieved optimal reconstruction results with a lower normalized mean square error (NRMSE = 0.1591), spatially and temporally, outperforming U-net (NRMSE = 0.2007) and the climatology prediction (NRMSE = 0.2089). Furthermore, Earthformer was more capable of identifying interannual nitrate anomalies. With a network interpretation technique, we quantified the spatio-temporal importance of every input feature in the best case (Earthformer with decomposed inputs). The influence of different input features on nitrate concentration in the adjacent Java Sea exhibited seasonal variation, stronger than the interannual one. The feature importance highlighted the role of dynamic factors, particularly the wind, matching our understanding of the dynamic controls of the ecosystem. Our reconstruction and network interpretation technique can be extended to other ecosystem variables, providing new possibilities in studies of marine environment and ecology from an AI perspective.
Collapse
Affiliation(s)
- Guangyu Gary Yang
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China
| | - Qishuo Wang
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China
| | - Jiacheng Feng
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China
| | - Lechi He
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China
| | - Rongzu Li
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China
| | - Wenfang Lu
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China; Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519000, China.
| | - Enhui Liao
- School of Oceanography, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Zhigang Lai
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, Guangdong, China; Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519000, China
| |
Collapse
|
10
|
Kang X, Zhao Y, Yao L, Tan Z. Explainable machine learning for predicting the geographical origin of Chinese Oysters via mineral elements analysis. Curr Res Food Sci 2024; 8:100738. [PMID: 38659973 PMCID: PMC11039350 DOI: 10.1016/j.crfs.2024.100738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/06/2024] [Accepted: 04/12/2024] [Indexed: 04/26/2024] Open
Abstract
The traceability of geographic origin is essential for guaranteeing the quality, safety, and protection of oyster brands. However, the current outcomes of traceability lack credibility as they do not adequately explain the model's predictions. Consequently, we conducted a study to evaluate the efficacy of utilizing explainable machine learning combined with mineral elements analysis. The study findings revealed that 18 elements have the ability to determine regional orientation. Simultaneously, individuals should pay closer attention to the potential risks associated with oyster consumption due to the regional differences in essential and toxic elements they contain. Light gradient boosting machine (LightGBM) model exhibited indistinguishable performance, achieving flawless accuracy, precision, recall, F1 score and AUC, with values of 96.77%, 96.43%, 98.53%, 97.32% and 0.998, respectively. The SHapley Additive exPlanations (SHAP) method was used to evaluate the output of the LightGBM model, revealing differences in feature interactions among oysters from different provinces. Specifically, the features Na, Zn, V, Mg, and K were found to have a significant impact on the predictive process of the model. Consistent with existing research, the use of explainable machine learning techniques can provide insights into the complex connections between important product attributes and relevant geographical information.
Collapse
Affiliation(s)
- Xuming Kang
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
| | - Yanfang Zhao
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
| | - Lin Yao
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
| | - Zhijun Tan
- Key Laboratory of Testing and Evaluation for Aquatic Product Safety and Quality, Ministry of Agriculture and Rural Affairs, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, 266071, China
- Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266071, China
- Collaborative Innovation Center of Seafood Deep Processing, Dalian Polytechnic University, Dalian, 116034, China
| |
Collapse
|
11
|
Rider Z, Percich A, Hiripitiyage Y, Harris TD, Sturm BSM, Wilson AE, Pollock ED, Beaver JR, Husic A. Drivers of cyanotoxin and taste-and-odor compound presence within the benthic algae of human-disturbed rivers. WATER RESEARCH 2024; 253:121357. [PMID: 38401471 DOI: 10.1016/j.watres.2024.121357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/18/2024] [Accepted: 02/21/2024] [Indexed: 02/26/2024]
Abstract
Freshwater benthic algae form complex mat matrices that can confer ecosystem benefits but also produce harmful cyanotoxins and nuisance taste-and-odor (T&O) compounds. Despite intensive study of the response of pelagic systems to anthropogenic change, the environmental factors controlling toxin presence in benthic mats remain uncertain. Here, we present a unique dataset from a rapidly urbanizing community (Kansas City, USA) that spans environmental, toxicological, taxonomic, and genomic indicators to identify the prevalence of three cyanotoxins (microcystin, anatoxin-a, and saxitoxin) and two T&O compounds (geosmin and 2-methylisoborneol). Thereafter, we construct a random forest model informed by game theory to assess underlying drivers. Microcystin (11.9 ± 11.6 µg/m2), a liver toxin linked to animal fatalities, and geosmin (0.67 ± 0.67 µg/m2), a costly-to-treat malodorous compound, were the most abundant compounds and were present in 100 % of samples, irrespective of land use or environmental conditions. Anatoxin-a (8.1 ± 11.6 µg/m2) and saxitoxin (0.18 ± 0.39 µg/m2), while not always detected, showed a systematic tradeoff in their relative importance with season, an observation not previously reported in the literature. Our model indicates that microcystin concentrations were greatest where microcystin-producing genes were present, whereas geosmin concentrations were high in the absence of geosmin-producing genes. Together, these results suggest that benthic mats produce microcystin in situ but that geosmin production may occur ex situ with its presence in mats attributable to adsorption by organic matter. Our study broadens the awareness of benthic cyanobacteria as a source of harmful and nuisance metabolites and highlights the importance of benthic monitoring for sustaining water quality standards in rivers.
Collapse
Affiliation(s)
- Zane Rider
- Department of Civil, Environmental and Architectural Engineering, University of Kansas, 2150 Learned Hall, Lawrence, KS 66045, United States
| | - Abigal Percich
- Department of Civil, Environmental and Architectural Engineering, University of Kansas, 2150 Learned Hall, Lawrence, KS 66045, United States
| | - Yasawantha Hiripitiyage
- Department of Civil, Environmental and Architectural Engineering, University of Kansas, 2150 Learned Hall, Lawrence, KS 66045, United States
| | - Ted D Harris
- Kansas Biological Survey, University of Kansas, Lawrence, KS 66045, United States
| | - Belinda S M Sturm
- Department of Civil, Environmental and Architectural Engineering, University of Kansas, 2150 Learned Hall, Lawrence, KS 66045, United States
| | - Alan E Wilson
- School of Fisheries, Aquaculture, and Aquatic Sciences, Auburn University, Auburn, AL 36849, United States
| | - Erik D Pollock
- Stable Isotope Laboratory, University of Arkansas, Fayetteville, AR 72701, United States
| | - John R Beaver
- BSA Environmental Services, Beachwood, OH 44122, United States
| | - Admin Husic
- Department of Civil, Environmental and Architectural Engineering, University of Kansas, 2150 Learned Hall, Lawrence, KS 66045, United States.
| |
Collapse
|
12
|
Wu J, Chen X, Li R, Wang A, Huang S, Li Q, Qi H, Liu M, Cheng H, Wang Z. A novel framework for high resolution air quality index prediction with interpretable artificial intelligence and uncertainties estimation. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 357:120785. [PMID: 38583378 DOI: 10.1016/j.jenvman.2024.120785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 02/02/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]
Abstract
Accurate air quality index (AQI) prediction is essential in environmental monitoring and management. Given that previous studies neglect the importance of uncertainty estimation and the necessity of constraining the output during prediction, we proposed a new hybrid model, namely TMSSICX, to forecast the AQI of multiple cities. Firstly, time-varying filtered based empirical mode decomposition (TVFEMD) was adopted to decompose the AQI sequence into multiple internal mode functions (IMF) components. Secondly, multi-scale fuzzy entropy (MFE) was applied to evaluate the complexity of each IMF component and clustered them into high and low-frequency portions. In addition, the high-frequency portion was secondarily decomposed by successive variational mode decomposition (SVMD) to reduce volatility. Then, six air pollutant concentrations, namely CO, SO2, PM2.5, PM10, O3, and NO2, were used as inputs. The secondary decomposition and preliminary portion were employed as the outputs for the bidirectional long short-term memory network optimized by the snake optimization algorithm (SOABiLSTM) and improved Catboost (ICatboost), respectively. Furthermore, extreme gradient boosting (XGBoost) was applied to ensemble each predicted sub-model to acquire the consequence. Ultimately, we introduced adaptive kernel density estimation (AKDE) for interval estimation. The empirical outcome indicated the TMSSICX model achieved the best performance among the other 23 models across all datasets. Moreover, implementing the XGBoost to ensemble each predicted sub-model led to an 8.73%, 8.94%, and 0.19% reduction in RMSE, compared to SVM. Additionally, by utilizing SHapley Additive exPlanations (SHAP) to assess the impact of the six pollutant concentrations on AQI, the results reveal that PM2.5 and PM10 had the most notable positive effects on the long-term trend of AQI. We hope this model can provide guidance for air quality management.
Collapse
Affiliation(s)
- Junhao Wu
- State Key Laboratory of Estuarine and Coastal Research, East China Normal University, Shanghai, 200062, China
| | - Xi Chen
- School of Geographic Sciences, East China Normal University, Shanghai, 200241, China; Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai, 200241, China; Key Laboratory of Spatial-Temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, Shanghai, 200241, China.
| | - Rui Li
- School of Geographic Sciences, East China Normal University, Shanghai, 200241, China
| | - Anqi Wang
- Department of Mathematics, The University of Manchester, Manchester, M13 9PL, UK
| | - Shutong Huang
- School of Geographic Sciences, East China Normal University, Shanghai, 200241, China
| | - Qingli Li
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, 200241, China
| | - Honggang Qi
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Min Liu
- School of Geographic Sciences, East China Normal University, Shanghai, 200241, China; Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai, 200241, China
| | - Heqin Cheng
- State Key Laboratory of Estuarine and Coastal Research, East China Normal University, Shanghai, 200062, China.
| | - Zhaocai Wang
- College of Information, Shanghai Ocean University, Shanghai, 201306, China.
| |
Collapse
|
13
|
Liu J, Li X, Zhu P. Effects of Various Heavy Metal Exposures on Insulin Resistance in Non-diabetic Populations: Interpretability Analysis from Machine Learning Modeling Perspective. Biol Trace Elem Res 2024:10.1007/s12011-024-04126-3. [PMID: 38409445 DOI: 10.1007/s12011-024-04126-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/22/2024] [Indexed: 02/28/2024]
Abstract
Increasing and compelling evidence has been proved that heavy metal exposure is involved in the development of insulin resistance (IR). We trained an interpretable predictive machine learning (ML) model for IR in the non-diabetic populations based on levels of heavy metal exposure. A total of 4354 participants from the NHANES (2003-2020) with complete information were randomly divided into a training set and a test set. Twelve ML algorithms, including random forest (RF), XGBoost (XGB), logistic regression (LR), GaussianNB (GNB), ridge regression (RR), support vector machine (SVM), multilayer perceptron (MLP), decision tree (DT), AdaBoost (AB), Gradient Boosting Decision Tree (GBDT), Voting Classifier (VC), and K-Nearest Neighbour (KNN), were constructed for IR prediction using the training set. Among these models, the RF algorithm had the best predictive performance, showing an accuracy of 80.14%, an AUC of 0.856, and an F1 score of 0.74 in the test set. We embedded three interpretable methods, the permutation feature importance analysis, partial dependence plot (PDP), and Shapley additive explanations (SHAP) in RF model for model interpretation. Urinary Ba, urinary Mo, blood Pb, and blood Cd levels were identified as the main influencers of IR. Within a specific range, urinary Ba (0.56-3.56 µg/L) and urinary Mo (1.06-20.25 µg/L) levels exhibited the most pronounced upwards trend with the risk of IR, while blood Pb (0.05-2.81 µg/dL) and blood Cd (0.24-0.65 µg/L) levels showed a declining trend with IR. The findings on the synergistic effects demonstrated that controlling urinary Ba levels might be more crucial for the management of IR. The SHAP decision plot offered personalized care for IR based on heavy metal control. In conclusion, by utilizing interpretable ML approaches, we emphasize the predictive value of heavy metals for IR, especially Ba, Mo, Pb, and Cd.
Collapse
Affiliation(s)
- Jun Liu
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital of Chongqing Medical University, 74 Linjiang Road, Yuzhong District, Chongqing, 400010, China
| | - Xingyu Li
- Cardiovascular Medicine, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Peng Zhu
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital of Chongqing Medical University, 74 Linjiang Road, Yuzhong District, Chongqing, 400010, China.
| |
Collapse
|
14
|
Talukdar S, Shahfahad, Bera S, Naikoo MW, Ramana GV, Mallik S, Kumar PA, Rahman A. Optimisation and interpretation of machine and deep learning models for improved water quality management in Lake Loktak. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119866. [PMID: 38147770 DOI: 10.1016/j.jenvman.2023.119866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/28/2023] [Accepted: 12/13/2023] [Indexed: 12/28/2023]
Abstract
Loktak Lake, one of the largest freshwater lakes in Manipur, India, is critical for the eco-hydrology and economy of the region, but faces deteriorating water quality due to urbanisation, anthropogenic activities, and domestic sewage. Addressing the urgent need for effective pollution management, this study aims to assess the lake's water quality status using the water quality index (WQI) and develop advanced machine learning (ML) tools for WQI assessment and ML model interpretation to improve pollution management decision making. The WQI was assessed using entropy-based weighting arithmetic and three ML models - Gradient Boosting Machine (GBM), Random Forest (RF) and Deep Neural Network (DNN) - were optimised using a grid search algorithm in the H2O Application Programming Interface (API). These models were validated by various metrics and interpreted globally and locally via Partial Dependency Plot (PDP), Accumulated Local Effect (ALE) and SHapley Additive exPlanations (SHAP). The results show a WQI range of 72.38-100, with 52.7% of samples categorised as very poor. The RF model outperformed GBM and DNN and showed the highest accuracy and generalisation ability, which is reflected in the superior R2 values (0.97 in training, 0.9 in test) and the lower root mean square error (RMSE). RF's minimal margin of error and reliable feature interpretation contrasted with DNN's larger margin of error and inconsistency, which affected its usefulness for decision making. Turbidity was found to be a critical predictive feature in all models, significantly influencing WQI, with other variables such as pH and temperature also playing an important role. SHAP dependency plots illustrated the direct relationship between key water quality parameters such as turbidity and WQI predictions. The novelty of this study lies in its comprehensive approach to the evaluation and interpretation of ML models for WQI estimation, which provides a nuanced understanding of water quality dynamics in Loktak Lake. By identifying the most effective ML models and key predictive functions, this study provides invaluable insights for water quality management and paves the way for targeted strategies to monitor and improve water quality in this vital freshwater ecosystem.
Collapse
Affiliation(s)
- Swapan Talukdar
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| | - Shahfahad
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| | - Somnath Bera
- Department of Geography, Central University of South Bihar, Gaya, Bihar, 823001, India.
| | - Mohd Waseem Naikoo
- Department of Geography & Disaster Management, University of Kashmir, Srinagar, Jammu & Kashmir, 190006, India.
| | - G V Ramana
- Department of Civil Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - Santanu Mallik
- Department of Civil Engineering, National Institution of Technology, Agaratala, Tripura, 799046, India.
| | - Potsangbam Albino Kumar
- Department of Civil Engineering, National Institution of Technology, Imphal, Manipur, 795004, India.
| | - Atiqur Rahman
- Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, 110025, India.
| |
Collapse
|
15
|
Uddin MG, Nash S, Rahman A, Dabrowski T, Olbert AI. Data-driven modelling for assessing trophic status in marine ecosystems using machine learning approaches. ENVIRONMENTAL RESEARCH 2024; 242:117755. [PMID: 38008200 DOI: 10.1016/j.envres.2023.117755] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/05/2023] [Accepted: 11/20/2023] [Indexed: 11/28/2023]
Abstract
Assessing eutrophication in coastal and transitional waters is of utmost importance, yet existing Trophic Status Index (TSI) models face challenges like multicollinearity, data redundancy, inappropriate aggregation methods, and complex classification schemes. To tackle these issues, we developed a novel tool that harnesses machine learning (ML) and artificial intelligence (AI), enhancing the reliability and accuracy of trophic status assessments. Our research introduces an improved data-driven methodology specifically tailored for transitional and coastal (TrC) waters, with a focus on Cork Harbour, Ireland, as a case study. Our innovative approach, named the Assessment Trophic Status Index (ATSI) model, comprises three main components: the selection of pertinent water quality indicators, the computation of ATSI scores, and the implementation of a new classification scheme. To optimize input data and minimize redundancy, we employed ML techniques, including advanced deep learning methods. Specifically, we developed a CHL prediction model utilizing ten algorithms, among which XGBoost demonstrated exceptional performance, showcasing minimal errors during both training (RMSE = 0.0, MSE = 0.0, MAE = 0.01) and testing (RMSE = 0.0, MSE = 0.0, MAE = 0.01) phases. Utilizing a novel linear rescaling interpolation function, we calculated ATSI scores and evaluated the model's sensitivity and efficiency across diverse application domains, employing metrics such as R2, the Nash-Sutcliffe efficiency (NSE), and the model efficiency factor (MEF). The results consistently revealed heightened sensitivity and efficiency across all application domains. Additionally, we introduced a brand new classification scheme for ranking the trophic status of transitional and coastal waters. To assess spatial sensitivity, we applied the ATSI model to four distinct waterbodies in Ireland, comparing trophic assessment outcomes with the Assessment of Trophic Status of Estuaries and Bays in Ireland (ATSEBI) System. Remarkably, significant disparities between the ATSI and ATSEBI System were evident in all domains, except for Mulroy Bay. Overall, our research significantly enhances the accuracy of trophic status assessments in marine ecosystems. The ATSI model, combined with cutting-edge ML techniques and our new classification scheme, represents a promising avenue for evaluating and monitoring trophic conditions in TrC waters. The study also demonstrated the effectiveness of ATSI in assessing trophic status across various waterbodies, including lakes, rivers, and more. These findings make substantial contributions to the field of marine ecosystem management and conservation.
Collapse
Affiliation(s)
- Md Galal Uddin
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland.
| | - Stephen Nash
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland
| | - Azizur Rahman
- School of Computing, Mathematics and Engineering, Charles Sturt University, Wagga Wagga, Australia; The Gulbali Institute of Agriculture, Water and Environment, Charles Sturt University, Wagga Wagga, Australia
| | | | - Agnieszka I Olbert
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland
| |
Collapse
|
16
|
Wang C, Liu J, Qiu C, Su X, Ma N, Li J, Wang S, Qu S. Identifying the drivers of chlorophyll-a dynamics in a landscape lake recharged by reclaimed water using interpretable machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 906:167483. [PMID: 37832666 DOI: 10.1016/j.scitotenv.2023.167483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 09/21/2023] [Accepted: 09/28/2023] [Indexed: 10/15/2023]
Abstract
The water quality of lakes recharged by reclaimed water is affected by both the fluctuation of reclaimed water quality and the biochemical processes in the lakes, and therefore the main controlling factors of algal blooms are difficult to identify. Taking a typical landscape lake recharged by reclaimed water as an example and using the spatiotemporal distribution characteristics and correlation analysis of water quality indexes, we propose an interpretable machine learning framework based on random forest to predict chlorophyll-a (Chl-a). The model considered nutrient difference indexes between reclaimed water and lake water, and further used feature importance ranking and partial dependence plot to identify nutrient drivers. Results show that the NO3--N input from reclaimed water is the dominant nutrient driver for algal bloom especially at high temperatures, and the negative correlation between NO3--N and Chl-a in the lake water is the consequence of algal bloom rather than the cause. Our study provides new insights into the identification of eutrophication factors for lakes recharged by reclaimed water.
Collapse
Affiliation(s)
- Chenchen Wang
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China; Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Juan Liu
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China
| | - Chunsheng Qiu
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China.
| | - Xiao Su
- Tianjin Water Group Co., Ltd, Tianjin 300042, China
| | - Ning Ma
- Tianjin Eco-City Water Investment and Construction Ltd, Tianjin 300467, China
| | - Jing Li
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China
| | - Shaopo Wang
- School of Environmental and Municipal Engineering, Tianjin Chengjian University, Tianjin 300384, China; Tianjin Key Laboratory of Aquatic Science and Technology, Tianjin Chengjian University, Tianjin 300384, China
| | - Shen Qu
- Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
17
|
Wang Y, Luo Z, Luo J. Research on predicting the diffusion of toxic heavy gas sulfur dioxide by applying a hybrid deep learning model to real case data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 901:166506. [PMID: 37619734 DOI: 10.1016/j.scitotenv.2023.166506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 07/23/2023] [Accepted: 08/21/2023] [Indexed: 08/26/2023]
Abstract
Toxic heavy gas sulfur dioxide (SO2) is a specific life and environmental hazard. Predicting the diffusion of SO2 has become a research focus in fields such as environmental and safety studies. However, traditional methods, such as kinetic models, cannot balance precision and time. Thus, they do not meet the needs of emergency decision-making. Deep learning (DL) models are emerging as a highly regarded solution, providing faster and more accurate predictions of gas concentrations. To this end, this study proposes an innovative hybrid DL model, the parallel-connected convolutional neural network-gated recurrent unit (PC CNN-GRU). This model utilizes two CNNs connected in parallel to process gas release and meteorological datasets, enabling the automatic extraction of high-dimensional data features and handling of long-term temporal dependencies through the GRU. The proposed model demonstrates good performance (RMSE, MAE, and R2 of 20.1658, 10.9158, and 0.9288, respectively) with real data from the Project Prairie Grass (PPG) case. Meanwhile, to address the issue of limited availability of raw data, in this study, time series generative adversarial network (TimeGAN) are introduced for SO2 diffusion studies for the first time, and their effectiveness is verified. To enhance the practicality of the research, the contribution of drivers to SO2 diffusion is quantified through the utilization of the permutation importance (PIMP) and Sobol' method. Additionally, the maximum safe distance downwind under various conditions is visualized based on the SO2 toxicity endpoint concentration. The results of the analyses can provide a scientific basis for relevant decisions and measures.
Collapse
Affiliation(s)
- Yuchen Wang
- School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China.
| | - Zhengshan Luo
- School of Management, Xi'an University of Architecture and Technology, Xi'an 710055, China.
| | - Jihao Luo
- School of Computer Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
18
|
Li W, Huang G, Tang N, Lu P, Jiang L, Lv J, Qin Y, Lin Y, Xu F, Lei D. Effects of heavy metal exposure on hypertension: A machine learning modeling approach. CHEMOSPHERE 2023; 337:139435. [PMID: 37422210 DOI: 10.1016/j.chemosphere.2023.139435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/10/2023]
Abstract
Heavy metal exposure is a common risk factor for hypertension. To develop an interpretable predictive machine learning (ML) model for hypertension based on levels of heavy metal exposure, data from the NHANES (2003-2016) were employed. Random forest (RF), support vector machine (SVM), decision tree (DT), multilayer perceptron (MLP), ridge regression (RR), AdaBoost (AB), gradient boosting decision tree (GBDT), voting classifier (VC), and K-nearest neighbour (KNN) algorithms were utilized to generate an optimal predictive model for hypertension. Three interpretable methods, the permutation feature importance analysis, partial dependence plot (PDP), and Shapley additive explanations (SHAP) methods, were integrated into a pipeline and embedded in ML for model interpretation. A total of 9005 eligible individuals were randomly allocated into two distinct sets for predictive model training and validation. The results showed that among the predictive models, the RF model demonstrated the highest performance, achieving an accuracy rate of 77.40% in the validation set. The AUC and F1 score for the model were 0.84 and 0.76, respectively. Blood Pb, urinary Cd, urinary Tl, and urinary Co levels were identified as the main influencers of hypertension, and their contribution weights were 0.0504 ± 0.0482, 0.0389 ± 0.0256, 0.0307 ± 0.0179, and 0.0296 ± 0.0162, respectively. Blood Pb (0.55-2.93 μg/dL) and urinary Cd (0.06-0.15 μg/L) levels exhibited the most pronounced upwards trend with the risk of hypertension within a specific value range, while urinary Tl (0.06-0.26 μg/L) and urinary Co (0.02-0.32 μg/L) levels demonstrated a declining trend with hypertension. The findings on the synergistic effects indicated that Pb and Cd were the primary determinants of hypertension. Our findings underscore the predictive value of heavy metals for hypertension. By utilizing interpretable methods, we discerned that Pb, Cd, Tl, and Co emerged as noteworthy contributors within the predictive model.
Collapse
Affiliation(s)
- Wenxiang Li
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China.
| | - Guangyi Huang
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Ningning Tang
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Peng Lu
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Li Jiang
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Jian Lv
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Yuanjun Qin
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Yunru Lin
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China
| | - Fan Xu
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China.
| | - Daizai Lei
- Department of Ophthalmology, the People's Hospital of Guangxi Zhuang Autonomous Region & Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences & Guangxi Key Laboratory of Eye Health & Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Nanning, 530021, China.
| |
Collapse
|
19
|
Bolick MM, Post CJ, Naser MZ, Mikhailova EA. Comparison of machine learning algorithms to predict dissolved oxygen in an urban stream. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023:10.1007/s11356-023-27481-5. [PMID: 37266780 DOI: 10.1007/s11356-023-27481-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 05/03/2023] [Indexed: 06/03/2023]
Abstract
Water quality monitoring for urban watersheds is critical to identify the negative urbanization impacts. This study sought to identify a successful predictive machine learning model with minimal parameters from easy-to-deploy, low-cost sensors to create a monitoring system for the urban stream network, Hunnicutt Creek, in Clemson, SC, USA. A multiple linear regression model was compared to machine learning algorithms k-nearest neighbor, decision tree, random forest, and gradient boosting. These algorithms were evaluated to understand which best predicted dissolved oxygen (DO) from water temperature, conductivity, turbidity, and water level change at four locations along the urban stream. The random forest algorithm had the highest performance in predicting DO for all four sites, with Nash-Sutcliffe model efficiency coefficient (NSE) scores > 0.9 at three sites and > 0.598 at the fourth site. The random forest model was further examined using explainable artificial intelligence (XAI) and found that temperature influenced the DO predictions for three of the four sites, but there were different water quality interactions depending on site location. Calculating the land cover type in each site's sub-watershed revealed that different amounts of impervious surface and vegetation influenced water quality and the resulting DO predictions. Overall, machine learning combined with land cover data helps decision-makers better understand the nuances of urban watersheds and the relationships between urban land cover and water quality.
Collapse
Affiliation(s)
- Madeleine M Bolick
- Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC, 29634, USA.
| | - Christopher J Post
- Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC, 29634, USA
| | - Mohannad-Zeyad Naser
- Department of Civil and Environmental Engineering & Earth Sciences, Clemson University, Clemson, SC, 29634, USA
| | - Elena A Mikhailova
- Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC, 29634, USA
| |
Collapse
|
20
|
Cao J, Zhao D, Tian C, Jin T, Song F. Adopting improved Adam optimizer to train dendritic neuron model for water quality prediction. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9489-9510. [PMID: 37161253 DOI: 10.3934/mbe.2023417] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
As one of continuous concern all over the world, the problem of water quality may cause diseases and poisoning and even endanger people's lives. Therefore, the prediction of water quality is of great significance to the efficient management of water resources. However, existing prediction algorithms not only require more operation time but also have low accuracy. In recent years, neural networks are widely used to predict water quality, and the computational power of individual neurons has attracted more and more attention. The main content of this research is to use a novel dendritic neuron model (DNM) to predict water quality. In DNM, dendrites combine synapses of different states instead of simple linear weighting, which has a better fitting ability compared with traditional neural networks. In addition, a recent optimization algorithm called AMSGrad (Adaptive Gradient Method) has been introduced to improve the performance of the Adam dendritic neuron model (ADNM). The performance of ADNM is compared with that of traditional neural networks, and the simulation results show that ADNM is better than traditional neural networks in mean square error, root mean square error and other indicators. Furthermore, the stability and accuracy of ADNM are better than those of other conventional models. Based on trained neural networks, policymakers and managers can use the model to predict the water quality. Real-time water quality level at the monitoring site can be presented so that measures can be taken to avoid diseases caused by water quality problems.
Collapse
Affiliation(s)
- Jing Cao
- College of Science, Nanjing Forestry University, Nanjing 210037, Jiangsu, China
| | - Dong Zhao
- Wuxi Guotong Environmental Testing Technology, Co., Ltd, 214191, Jiangsu, China
| | - Chenlei Tian
- College of Science, Nanjing Forestry University, Nanjing 210037, Jiangsu, China
| | - Ting Jin
- College of Science, Nanjing Forestry University, Nanjing 210037, Jiangsu, China
| | - Fei Song
- College of Science, Nanjing Forestry University, Nanjing 210037, Jiangsu, China
| |
Collapse
|
21
|
Prediction of environmental factors responsible for chlorophyll a-induced hypereutrophy using explainable machine learning. ECOL INFORM 2023. [DOI: 10.1016/j.ecoinf.2023.102005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
22
|
Predicting the effects of winter water warming in artificial lakes on zooplankton and its environment using combined machine learning models. Sci Rep 2022; 12:16145. [PMID: 36167972 PMCID: PMC9515112 DOI: 10.1038/s41598-022-20604-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 09/15/2022] [Indexed: 11/08/2022] Open
Abstract
This work deals with the consequences of climate warming on aquatic ecosystems. The study determined the effects of increased water temperatures in artificial lakes during winter on predicting changes in the biomass of zooplankton taxa and their environment. We applied an innovative approach to investigate the effects of winter warming on zooplankton and physico-chemical factors. We used a modelling scheme combining hierarchical clustering, eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanations (SHAP) algorithms. Under the influence of increased water temperatures in winter, weight- and frequency-dominant Crustacea taxa such as Daphnia cucullata, Cyclops vicinus, Cryptocyclops bicolor, copepodites and nauplii, and the Rotifera: Polyarthra longiremis, Trichocerca pusilla, Keratella quadrata, Asplanchna priodonta and Synchaeta spp. tend to decrease their biomass. Under the same conditions, Rotifera: Lecane spp., Monommata maculata, Testudinella patina, Notholca squamula, Colurella colurus, Trichocerca intermedia and the protozoan species Centropyxis acuelata and Arcella discoides with lower size and abundance responded with an increase in biomass. Decreases in chlorophyll a, suspended solids and total nitrogen were predicted due to winter warming. Machine learning ensemble models used in innovative ways can contribute to the research utility of studies on the response of ecological units to environmental change.
Collapse
|
23
|
Prediction and Interpretation of Water Quality Recovery after a Disturbance in a Water Treatment System Using Artificial Intelligence. WATER 2022. [DOI: 10.3390/w14152423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In this study, an ensemble machine learning model was developed to predict the recovery rate of water quality in a water treatment plant after a disturbance. XGBoost, one of the most popular ensemble machine learning models, was used as the main framework of the model. Water quality and operational data observed in a pilot plant were used to train and test the model. Disturbance was determined when the observed turbidity was higher than the given turbidity criteria. Therefore, the recovery rate of water quality at a time t was defined during the falling limb of the turbidity recovery period. It was considered as a relative ratio of the differences between the peak and observed turbidities at time t to the difference between the peak turbidity and turbidity criteria. The root mean square error–observation standard deviation ratio of the XGBoost model improved from 0.730 to 0.373 by pretreatment, removing the observation for the rising limb of the disturbance from the training data. Moreover, Shapley value analysis, a novel explainable artificial intelligence method, was used to provide a reasonable interpretation of the model’s performance.
Collapse
|